racket / rackunit

Other
18 stars 32 forks source link

Failures not detected in test-suite -> test-case -> test-* forms #136

Closed yurkobb closed 2 years ago

yurkobb commented 3 years ago

Perhaps related to #98:

#lang racket/base

(require rackunit
         rackunit/text-ui)

(define tests
  (test-suite
   "Top"
   (test-case "Test case"
     (test-equal? "A is equal? to B"
                  "A" "B"))))

(run-tests tests)
$ raco test rackunit-test.rkt
raco test: "rackunit-test.rkt"
--------------------
Top > A is equal? to B
FAILURE
name:       check-equal?
location:   rackunit-test.rkt:10:5
actual:     "A"
expected:   "B"
--------------------
1 success(es) 0 failure(s) 0 error(s) 1 test(s) run
0
1 test passed

Also notice that "Test case" is missing from the output. It should probably be Top > Test case > A is equal? to B.

(Racket 7.8).

countvajhula commented 2 years ago

This does seem related to #98 and "nested" test cases but there's more to it. Some observations using the example in the description:

  1. If we replace test-equal? with the equivalent check-equal? form, it reports the fail count correctly.
(define tests
  (test-suite
   "Top"
   (test-case "Test case"
     (check-equal? "A"
                   "B" "A is equal? to B"))))

=> 0 success(es) 1 failure(s) 0 error(s) 1 test(s) run
  1. If we remove the test-case altogether, and replace the test with a simple check-equal? form, it once again reports the count incorrectly:
(define tests
  (test-suite
   "Top"
   (check-equal? "A"
                 "B" "A is equal? to B")))

=> 1 success(es) 0 failure(s) 0 error(s) 1 test(s) run

Personally, I didn't even know that test-equal? existed, which is a shortcut for (test-case ... (check-equal? ... )), and have been using the second form (i.e. check-* within test-suite) in my tests extensively, in addition to using test-case.

Note that even though the check doesn't count as failed in the summary, it does still count as one test. Having n checks results in n test(s) run. I think we'd agree that it should either count as a test and also count for success/failure or it should count for neither, but not count for one and not the other. So some possible ways in which it should work are:

  1. check-* forms that aren't contained in a test-case -- whether they are enclosed in a test-suite or not -- should be counted individually, i.e. as test cases, for success or failure. test-case is an optional wrapper to add additional context. This would avoid the need to rewrite existing top level checks to wrap them in test-case* forms in order to have them count. The other possibility is to disallow checks that aren't enclosed in some form of test-case form. From my vantage point, I'd prefer the former, especially since it preserves backwards compatibility.

  2. Re: nested forms, I actually really like the flexibility of being able to nest forms with abandon. But it seems that rackunit is somewhere between two versions of itself here. On the one hand, if we allow arbitrary nesting, why not just have a single test form like test-case (in addition to the check- forms) which can be nested to arbitrary depth, and the counting can just be all the terminals, i.e. checks. In this solution, test-case means "the list of tests", whose members can either be test-cases or checks (providing the base case). On the other hand, if we have both test-suite and test-case to capture "the list of tests" and "a specific test", respectively, we could say that since test-suites can already be nested, that nesting a test-case isn't necessary -- we could just replace any sequence of (test-case ... (test-case ...)) with (test-suite ... (test-case ...)), with only the terminal being a test-case and all the enclosing ones being test-suites. In this case, rackunit could signal a compile error if a test-case-within-a-test-case is encountered, as the author suggested in #98 , or it could just treat the whole thing as one test, even if arbitrarily nested.

On point 2, I think I'd lean towards coalescing test-case and test-suite to be just one form, maybe aliasing one to the other, so that they can be arbitrarily nested. It might also be easier to define higher-level operations on tests this way, e.g. composing two tests to create a union test, for whatever that may be worth, or extracting the component of tests that failed as a fresh test programmatically. Not sure if this could work, but it might enable a way to programmatically iterate a test-fix-rerun loop. But since the source identifiers are statically bound, I'm not sure if this would be useful unless we could dynamically load those identifiers to re-run an evolving set of tests. Now this is just crazy talk 😄

countvajhula commented 2 years ago

I'd further suggest closing this issue and merging the discussion into #98 . I've reposted my comment there.