jorgenschaefer / emacs-buttercup

Behavior-Driven Emacs Lisp Testing
GNU General Public License v3.0
360 stars 44 forks source link

after-each overwrite pending state #149

Closed DarwinAwardWinner closed 4 years ago

DarwinAwardWinner commented 4 years ago

I'm writing a test suite for ido. Using current buttercup, I get the following output (also visible on Travis here):

$ cask exec buttercup -L included-ido tests
Ido location: "/home/travis/build/DarwinAwardWinner/ido-test-suite/included-ido/ido.el"
Running 17 out of 21 specs.
ido
  ido-completing-read
    should accept a matching completion (80.89ms)
    should complete with a matching item on RET (34.91ms)
    should complete with the first match when multiple matches are available (24.92ms)
    should allow <left> and <right> to cycle completions, with wrap-around (35.82ms)
    should allow C-j to exit with a partial completion (35.29ms)
  ido file operations
    ido-read-file-name
      Should allow selecting an existing file with RET (49.05ms)
      Should allow falling back to default completion with C-f (49.16ms)
    ido-read-directory-name
      Should allow selecting an existing directory with RET (48.64ms)
      Should not allow selecting an existing non-directory file with RET (35.25ms)
    ido-find-file
      should allow finding an existing file with RET (42.14ms)
      should allow finding new files (51.91ms)
    ido-write-file
      should allow writing new files with RET RET (37.55ms)
      should allow overwriting existing files with RET y (20.46ms)
    ido-dired
  ido buffer operations
    ido-read-buffer
      should allow selecting an existing buffer with RET (36.32ms)
      should allow selecting a new buffer name with RET RET (35.77ms)
      should allow falling back to default completion with C-f (43.21ms)
    ido-switch-buffer
      should allow switching to an existing buffer with RET (33.74ms)
      should allow creating a new buffer with RET RET (35.18ms)
  regression tests
    should not exhibit bug #19412 (19.45ms)
    should not exhibit bug #11861 (28.42ms)
    should not exhibit bug #1516 (20.58ms)
Ran 21 specs, 0 failed, in 0.8 seconds.

Buttercup correctly indicates at the top that several specs are not run, but at the bottom it just says "ran 21 specs", and every spec is output in green with timing info adjacent, which means I can't tell which specs were actually run.

Looking at the code for this commit, you can see that the last 3 tests are decalred with xit, which means they should be marked as PENDING.

DarwinAwardWinner commented 4 years ago

Side note: I wonder if there's a way to subtract out what seems to be about 20 ms of overhead from each test's timing value.

snogge commented 4 years ago

For me, running make test in emacs-buttercup does print pending/skipped tests in yellow and mark them with PENDING. Can you try that on your system? I also don't see the 20ms overhead.

It should be possible to list pending tests in the trailer, if that helps.

Do you have the tests somewhere I can download and try them?

Output from emcas-buttercup:

...

A spec
  is just a function, so it can contain any code (0.06ms)
  can have more than one expectation (0.11ms)
  nested inside a second describe
    can reference both scopes as needed (0.07ms)

A spec
  is just a function, so it can contain any code  PENDING (0.05ms)

Pending specs
  can be declared using `xit'  PENDING (0.04ms)
  can be declared with `it' but without a body  PENDING (0.05ms)

A spy
  tracks that the spy was called (0.04ms)
  tracks all arguments of its calls (0.07ms)
  stops all execution on a function (0.07ms)

...
DarwinAwardWinner commented 4 years ago

The tests are here: https://github.com/DarwinAwardWinner/ido-test-suite

Oddly, running buttercup's own test suite shows the pending tests just fine, but running this specific test suite doesn't, either on my local machine or on Travis.

snogge commented 4 years ago

I'm such a klutz.

snogge commented 4 years ago

It's something that is triggered by your before-each and after-each functions. If I remove them the tests are properly marked as pending. Of course, some of the tests are failing, but they are properly reported :)

Edit: So it's not your after-each that is the problem. It seems any after-each will trigger this problem. Here is a minimal reproducer:

(describe "This test"
  (after-each
   (ignore))
 (xit "will not be marked pending"))

Note that before-each does not trigger this problem.

snogge commented 4 years ago

So I found the problem. buttercup--run-spec uses buttercup--update-with-funcall to run all before-each functions, the actual spec body function and the after-each functions. Each of these functions update the spec results if it is already success or pending, so any after-each function will overwrite a pending state. I'm not sure whether it is a good thing to track the status of before-each and after-each functions, but I guess at least failing functions should be tracked.

DarwinAwardWinner commented 4 years ago

Ok, so it seems that the fix is to prevent a "passed" status from overwriting a previously-set "pending" description? Independently of that, are before-each and after-each forms run on pending specs? It seems like they shouldn't be. I'll have a go at fixing both issues when I get a chance.

DarwinAwardWinner commented 4 years ago

Regarding my second question, I suppose it's not possible to know ahead of time whether a particular spec will throw a buttercup-pending signal, so all the before forms need to be run just in case. And if the before forms are run, then the after forms must also be run. So, not very efficient, but it seems like the only correct way to implement it.

snogge commented 4 years ago

I have a fix, I just need to polish the tests a bit. Edit: And I wrote that without refreshing and seeing your PR. I'll have a look later.

snogge commented 4 years ago

@jorgenschaefer, do you have any input? Is there a reason for setting the test state from the -each and -all functions, or is that just a side-effect of code reuse?

DarwinAwardWinner commented 4 years ago

I assume it's because an error during the before/after steps should count as a test failure even if the test body succeeds. However, I'm not certain whether failed or pending ought to take precedence.

jorgenschaefer commented 4 years ago

@DarwinAwardWinner is right, errors in before/after functions should propagate. I guess the fix is indeed to not override pending with success, but error should override both …

DarwinAwardWinner commented 4 years ago

After thinking a bit more, I agree, failed should take precedence over pending status. Otherwise imagine you have pending test A whose teardown code throws an error and fails to properly clean up, and that leaked state in turn causes test B to fail. If you don't allow the teardown error to mark test A as failed, you will get a result that A is pending and B failed, but you won't be able to track down the source of the error in test B because the error in A was never reported.

Obviously it's a bit unintuitive that a pending test can fail, but it's less unintuitive than potentially hiding the true source of an error in a subsequent test.

snogge commented 4 years ago

I've created a new issue #157 to track the handling of state from *-each. Fixing that properly will require a bunch of new tests and will take a little time. On the other hand we should fix this issue now as we have at least two suggested fixes.

156 has been updated with a new solution for this issue. My idea is to clear the state of each spec before running it.

This should let us eliminate a few corner cases in the code, as you can see in the PR I've already removed two.

DarwinAwardWinner commented 4 years ago

I'm not sure I understand how resetting the spec at the start of buttercup--run-spec solves this issue. Although the tests passed, so presumably it's fixing it somehow.

snogge commented 4 years ago

The reason we had to check for specs with status pending in buttercup--update-with-funcall is that pending spec will have their status set to pending when they are created. By having a well defined starting state (success) we do not need to check for that. But we do need to set the state to pending to print the initial "Running X specs of Y" line in the report. Once the reporter has been called with buttercup-started the count is no longer needed, but in fact recreated during the run.

DarwinAwardWinner commented 4 years ago

Ok, so the current state of #156 allows either pending and failed to overwrite passed, but not to overwrite each other. Then, it ensures that each test starts in passed state, so the first state change to either pending or failed is permanent. That should work for most things, although it will break in the very rare case of an error being thrown from the after-each form of a pending spec (in which case the spec will be marked as pending when it should be marked failed).

I think that's good enough for a quick fix, although the full fix might need to be a bit fancier.

DarwinAwardWinner commented 4 years ago

As for the full fix, here's my thoughts:

I wasn't aware that pending specs were initially created with pending status, and that this status is used to generate the counts at the start of the report. If this is the case, then does that mean specs aren't supposed to throw their own pending signals? I.e. is this incorrect?

(it "is actually a pending test"
  (signal 'buttercup-pending "PENDING"))

(Obviously real-life usage would presumably be more complex, perhaps using it to skip a test that requires an optional dependency?)

Assuming this kind of pending test is not supported, and given that xit sets the test status to pending initially, it seems like it should be possible to skip running before-each and after-each forms for pending tests. If so, I think that fixes this issue for real.

DarwinAwardWinner commented 4 years ago

Or to put it more succinctly, if pending specs will always have their status set to pending before buttercup--run-spec is called, then maybe buttercup--run-spec should essentially be a no-op on pending specs.

snogge commented 4 years ago

An xit form will replace the test body with a buttercup-pending signal, so

(xit 
  (expect ...))

is the same as

(it 
  (signal `buttercup-pending "PENDING"))

likewise for empty it forms.

The buttercup-pending signals are then used to mark the tests as pending (or skipped which use (signal 'buttercup-pending "SKIPPED") to set that status). So the summary at the end of the test run use that count instead.

There is also an assume macro which expands to the buttercup-skip function which also signals buttercup-pending.

So I guess my point is that signaling buttercup-pending (or other symbols) is the correct way to set a spec as pending, but we do a little hack at the beginning to provide pretty reports.

DarwinAwardWinner commented 4 years ago

Ok, so it really isn't possible to know ahead of time which tests might signale buttercup-pending, which means we really do need to run before-each and after-each forms for every test. Which means we do need the full logic of failed > pending > passed.

snogge commented 4 years ago

There may be more to it than that, which is one of the reasons to handle it in a separate issue.

I'm a bit concerned about how assume will fit with for-each and after-each. If assume is used to skip tests that depend on some optional feature, we really shouldn't run other pieces of the tests that may use the missing feature. I've yet to work up any realistic examples, but it's something to think about.

DarwinAwardWinner commented 4 years ago

Should we switch to discussing this in #157 then?

snogge commented 4 years ago

As you've probably noticed I have not merged any of the PRs yet, and that is because the more I think about it the more I think neither fixes the problem properly. The solution in #156 will also make solving some other things such as #161, #166, #164, and #158 hairier.

I'm leaning more and more towards not actually running the pending specs - including any before-each and after-each functions - at all. In the same vein, suites that only contain pending tests should not run their before-all and after-all functions. The reason for this is that the setup and teardown can be costly, and actually be wrong if the spec is skipped. I still have some doubts about what will happen when there are expect calls inside after-each or after-all blocks. Any such expects will probably fail for skipped tests, and that will set the spec to failed instead of skipped. We've not seen any such report which I guess means users don't do this type of thing.

DarwinAwardWinner commented 4 years ago

I agree that running nothing at all for pending specs/suites is the ideal solution. The problem is that in order to do that, we need to know whether a spec is pending without running it. We can certainly do this for xdescribe and xit. But for other tests, we can't determine the pending status of any other test until it runs, because any test is allowed to signal buttercup-pending. And we can't run the test without running the before forms first, and we can't run the before forms without running the after forms as well to clean up after them.

As long as tests are allowed to mark themselves as pending/skipped at runtime, I can't think of a way to get around this. And that means that even if we skip everything for xit and xdescribe tests, we still need to handle the case where a test with before/after forms signals buttercup-pending, including cases when one of those before/after forms signals an error or pending status as well.

Do you see anything I'm missing here?

snogge commented 4 years ago

I think we can do both, the ones marked as pending with xit or skipped with the -poption are not run. The rest must obviously be run, and it is OK if they signal pending. We still have to handle the situation where any after-function signals an error though. I'll try to implement this and see how it goes. Something I didn't mention in my last comment is that all the other things in the works will probably be harder to do with my reset-spec-state implementation.

DarwinAwardWinner commented 4 years ago

Would it be useful to distinguish internally between tests skipped at declaration time (i.e. xit etc.) vs. tests skipped at runtime (e.g. by calling buttercup-skip)? I.e. have different statuses for each? They would still both be reported as pending tests, but internally it would be 2 distinct statuses.