Basic test harness and test suite

wryun / es-shell

es: a shell with higher-order functions

http://wryun.github.io/es-shell/

Other

307 stars 25 forks source link

Basic test harness and test suite #71

Closed jpco closed 5 days ago

jpco commented 9 months ago

The way tests work under this setup is this: Each file defines a test function to perform some operation on its arguments and return a result. Each file also contains a number of cases and wants (which must correspond 1:1). The cases are each passed to test and run, and the results are compared against the corresponding wants.

This is clearly a design which favors running each test with many different inputs. I like this -- I think it encourages thoroughness -- but I know it's a particular "style" of test and may not be universally appreciated. Other than that limitation, I think the ability to define your own test function makes things very flexible.

In general, I want to know how people feel about the files under the tests/ directory themselves. Is this a design people find intuitive and useful? If you were implementing a new feature for the shell, would you find it easy to add a test file?

If people like the design, then some obvious follow-ups are:

Far more tests for far more behaviors. Comb the man page for material. Probably, make improvements to test.es along the way.
Enable running the tests under a different es binary than the test harness. (OR, potentially, rewrite the test harness in a language other than es script!)
Makefile integration, so it's easy to run make test to run all the tests.
CircleCI integration -- likely with an alternate test.es specialized for the task. I'd have to study how CircleCI works to get more specific.

memreflect commented 9 months ago

I found a couple of lines that i would personally change, but it certainly seems to work well after trying it out with my own sample tests. That's some really nice code overall in my opinion.

wryun commented 9 months ago

Haven't deeply at the code yet, but I'm a huge fan of having separate test files and the test cases being easy to define.

My instinct if I were writing this would be to lean more on the approach in the trip.es check/errorcheck functions, executing the tests immediately rather than eval-ing them later and relying so much on being inside the framework. The framework would instead add hooks for statistics so that the functions wouldn't abort immediately, and a summarise function at the end. I like simple control flow :)

i.e. something like:

testcase 'echo does an echo' {
  assertoutput a {echo a}
  assertoutput b {echo b}
  assertresult 0 {echo}  # you wouldn't really want this in this case - assume assertresult would already check the return val
}

(where test defines assertoutput/assertresult in the appropriate scope and dumps statistics into some global, then the main runner prints out statistics as appropriate)

But I'm probably just bikeshedding, and I'm super happy to have better tests. I agree that your approach encourages people to put a bunch of different inputs in, which is very nice. I wouldn't like to merge until it became the only way though (i.e. subsuming any tests in trip.es and running happily in CI).

jpco commented 9 months ago

Actually, I'm playing around implementing this testcase/assertoutput setup and it's looking really nice. One thing that didn't occur to me is that inside each testcase, the asserts can be run along with normal code -- so you could if necessary do things like

testcase 'input substitution works' {
  # arbitrary setup code goes here

  if {~ <=$&primitives readfrom} {
    assert-result {test-depending-on-primitive} output
  } {
    assert-result {test-depending-on-workaround} output
  }

  assert-result {common-behavior-test} output
  # more tests and teardown
}

This makes this design at least as flexible as what I've written, and it's a lot more straightforward to actually read and write. I'll take some time on this -- unfortunately, there's some inherent complexity here thanks to challenges like juggling two separate lists cleanly to compare results, and keeping the test harness safe when cases do things like cause panics. So testcase here is likely going to be fairly involved. But I think this design is a winner anyway.

jpco commented 9 months ago

All right, I've changed how the tests work -- and migrated trip.es so we can see how that looks in this new context. Goodness is there a lot of wacky stuff in trip.es.

Tests are invoked like es -s < test.es tests/test-file.es. The current state of things is more than a bit messy, but I didn't want to try to implement the whole world before giving folks a chance to give feedback.

memreflect commented 9 months ago

You used strings in tests/embedded-eq.es and a few other places; why not {blocks} instead?

The report function could be a tiny bit DRYer:

fn report {
if {!~ $text-execution-failure ()} {
  echo test execution failure: $test-execution-failure
} {~ $#failed 0} {
  echo passed!
} {
  echo - $#passed cases passed, $#failed failed.
}
}

Are the other assert-* functions necessary or just conveniences? For example, assert-output currently does not allow reading from the standard input stream, setting it instead to /dev/null, so one must instead manually use something like the following instead of assert-output anyway:
```
assert {~ `^{$cmd-reading-stdin < path/to/input} $^want} $message
```
The conversion of trip.es isn't 1-to-1: line 98 (the cat >[0=] subtest in the "redirections" test) doesn't quite work as originally intended yet. I changed it the string to the simpler 'Bad file descriptor', and the test passed.

While the previous iteration worked well enough as a starting point, this design is certainly an improvement. It just needs a bit of cleanup (e.g. should assert use its message parameter in the pass-case as well? the $got parameter of pass-case can probably disappear since it's unused inside that function).

Slightly off-topic, i think the concept of the with-tmpfile function in the previous iteration of the PR is likely useful when assert is needed instead of assert-output or whatever, and i think you can even use it to implement assert-output and its file-related friends, though i would drop the recursion in favor of more es constructs and re-order the arguments:

fn with-tmpfile cmd filevars {
  let (filepaths = ()) {
    for (_unused_ = $filevars) {
      filepaths = $filepaths `mktemp
    }
    unwind-protect {
      local ($filevars = $filepaths) $cmd
    } {
      rm -f $filepaths
    }
  }
}

wryun commented 9 months ago

Wow, nice, CI integration! At the moment it seems like the test result upload has some minor issue, though (see error message): https://app.circleci.com/pipelines/github/wryun/es-shell/57/workflows/6d889f21-2f89-4aca-b546-d95ad6953487/jobs/59

Another minor issue is that fn test does a report at the end of each group of tests but doesn't reset the passes/fails. This ends up with a somewhat misleading output, IMO:

^=0 (100) gorkd:~/a/es-shell/es-shell [tests]
; ./es -s < test/test.es ./test/run-tests.es ./test/tests/*
eq is assignment in just the right places: passed!
eq is handled correctly in let binders: passed!
eq is handled correctly in with parens in args: passed!
eq is parsed with control flow syntax correctly: passed!
eq is parsed in if correctly: passed!
result results: passed!
matches match: passed!
echo echoes: passed!
redirects redirect: passed!
test 1: passed!
subjects: passed!
error: 
- {$es -c 'mtch () (* break)'} failed got mtch: No such file or directory
 want uncaught exception
- 58 cases passed, 1 failed.
single elements: - 61 cases passed, 1 failed.
multiple elements: - 64 cases passed, 1 failed.
single ranges: - 68 cases passed, 1 failed.
multiple ranges: - 72 cases passed, 1 failed.
open ranges: - 75 cases passed, 1 failed.
es -c: - 77 cases passed, 1 failed.
warning: ./test/tests/trip.es:9: null character ignored
lexical analysis: - 87 cases passed, 1 failed.
tokenizer errors: - 94 cases passed, 1 failed.
blow the input stack: - 95 cases passed, 1 failed.
umask: - 102 cases passed, 1 failed.
redirections: - 109 cases passed, 1 failed.
exceptions: - 110 cases passed, 1 failed.
heredocs and herestrings: - 120 cases passed, 1 failed.
tilde matching: - 123 cases passed, 1 failed.
flat command expansion: - 128 cases passed, 1 failed.
equal sign in command arguments: - 134 cases passed, 1 failed.

Finally, the main thing that make me reluctant to merge this is that I would appreciate a simpler interface and complete replacement of trip. That is:

remove make trip and trip.es and have make test and make test-circleci do the appropriate things
consider making the entry method to run-tests.es less surprising. IMO it would be better to just have a 'test.es' act more like a normal program, with an option to generate circleci output. The 'pipe in some definitions' method is cute, but I'm not sure it's useful.

Thanks heaps for all the work/thought on this.

jpco commented 7 months ago

I left this for a while because I got overwhelmed with it. I'm coming back now with fresh eyes and I think I'm gonna try and reduce the scope of this PR to avoid getting overwhelmed again. For this PR I want to:

Add the test files tests/example.es (which verifies and demoes the test mechanism), tests/trip.es, and tests/match.es (which together replace the current trip.es).
Add assert, but none of the assert-* functions, which have their own fiddly design considerations I don't want to deal with along with everything else.
Get both make test and CircleCI integration working -- I could put off CircleCI integration for a later PR, but I consider it kind of the point of all this, so I'd rather get it in early.
Remove make trip and the original trip.es in order to centralize things on the new test mechanism.

Things which are non-goals for this PR:

Good test coverage in terms of lines of the man page
Robustness in the face of shell-crashing bugs
Helpers for making test-writing convenient (e.g., the assert-* functions)

I'll simplify the entry point as part of this. Everyone's feedback asking me to reduce the "weird magic" in here has been correct and much appreciated.

wryun commented 7 months ago

:+1: appreciate all the PRs. Sorry I haven't been getting to them. Hopefully soon! Also, please don't view me as a source of authority. If you've had what you perceive as a non-controversial PR open for a few weeks and you think it should be merged, just do so.

(non-controversial = bugfix without significant refactoring)

jpco commented 5 days ago

Hey @wryun, how do you feel about this PR now? Any particular concerns? I'd love to get this in and start setting up more test coverage, including regression tests for some of the several bug fixes that have gone in lately.

wryun commented 5 days ago

Let's do it. Thanks!