Closed DarwinAwardWinner closed 4 years ago
Side note: I wonder if there's a way to subtract out what seems to be about 20 ms of overhead from each test's timing value.
For me, running make test
in emacs-buttercup
does print pending/skipped tests in yellow and mark them with PENDING. Can you try that on your system?
I also don't see the 20ms overhead.
It should be possible to list pending tests in the trailer, if that helps.
Do you have the tests somewhere I can download and try them?
Output from emcas-buttercup:
...
A spec
is just a function, so it can contain any code (0.06ms)
can have more than one expectation (0.11ms)
nested inside a second describe
can reference both scopes as needed (0.07ms)
A spec
is just a function, so it can contain any code PENDING (0.05ms)
Pending specs
can be declared using `xit' PENDING (0.04ms)
can be declared with `it' but without a body PENDING (0.05ms)
A spy
tracks that the spy was called (0.04ms)
tracks all arguments of its calls (0.07ms)
stops all execution on a function (0.07ms)
...
The tests are here: https://github.com/DarwinAwardWinner/ido-test-suite
Oddly, running buttercup's own test suite shows the pending tests just fine, but running this specific test suite doesn't, either on my local machine or on Travis.
I'm such a klutz.
It's something that is triggered by your before-each
and after-each
functions. If I remove them the tests are properly marked as pending. Of course, some of the tests are failing, but they are properly reported :)
Edit: So it's not your after-each
that is the problem. It seems any after-each
will trigger this problem. Here is a minimal reproducer:
(describe "This test"
(after-each
(ignore))
(xit "will not be marked pending"))
Note that before-each
does not trigger this problem.
So I found the problem.
buttercup--run-spec
uses buttercup--update-with-funcall
to run all before-each
functions, the actual spec body function and the after-each
functions. Each of these functions update the spec results if it is already success
or pending
, so any after-each
function will overwrite a pending
state. I'm not sure whether it is a good thing to track the status of before-each
and after-each
functions, but I guess at least failing functions should be tracked.
Ok, so it seems that the fix is to prevent a "passed" status from overwriting a previously-set "pending" description? Independently of that, are before-each
and after-each
forms run on pending specs? It seems like they shouldn't be. I'll have a go at fixing both issues when I get a chance.
Regarding my second question, I suppose it's not possible to know ahead of time whether a particular spec will throw a buttercup-pending
signal, so all the before forms need to be run just in case. And if the before forms are run, then the after forms must also be run. So, not very efficient, but it seems like the only correct way to implement it.
I have a fix, I just need to polish the tests a bit. Edit: And I wrote that without refreshing and seeing your PR. I'll have a look later.
@jorgenschaefer, do you have any input? Is there a reason for setting the test state from the -each and -all functions, or is that just a side-effect of code reuse?
I assume it's because an error during the before/after steps should count as a test failure even if the test body succeeds. However, I'm not certain whether failed or pending ought to take precedence.
@DarwinAwardWinner is right, errors in before/after functions should propagate. I guess the fix is indeed to not override pending
with success
, but error
should override both …
After thinking a bit more, I agree, failed should take precedence over pending status. Otherwise imagine you have pending test A whose teardown code throws an error and fails to properly clean up, and that leaked state in turn causes test B to fail. If you don't allow the teardown error to mark test A as failed, you will get a result that A is pending and B failed, but you won't be able to track down the source of the error in test B because the error in A was never reported.
Obviously it's a bit unintuitive that a pending test can fail, but it's less unintuitive than potentially hiding the true source of an error in a subsequent test.
I've created a new issue #157 to track the handling of state from *-each
. Fixing that properly will require a bunch of new tests and will take a little time. On the other hand we should fix this issue now as we have at least two suggested fixes.
This should let us eliminate a few corner cases in the code, as you can see in the PR I've already removed two.
I'm not sure I understand how resetting the spec at the start of buttercup--run-spec
solves this issue. Although the tests passed, so presumably it's fixing it somehow.
The reason we had to check for specs with status pending in buttercup--update-with-funcall
is that pending spec will have their status set to pending
when they are created. By having a well defined starting state (success
) we do not need to check for that. But we do need to set the state to pending to print the initial "Running X specs of Y" line in the report. Once the reporter has been called with buttercup-started
the count is no longer needed, but in fact recreated during the run.
Ok, so the current state of #156 allows either pending
and failed
to overwrite passed
, but not to overwrite each other. Then, it ensures that each test starts in passed
state, so the first state change to either pending
or failed
is permanent. That should work for most things, although it will break in the very rare case of an error being thrown from the after-each
form of a pending spec (in which case the spec will be marked as pending
when it should be marked failed
).
I think that's good enough for a quick fix, although the full fix might need to be a bit fancier.
As for the full fix, here's my thoughts:
I wasn't aware that pending specs were initially created with pending
status, and that this status is used to generate the counts at the start of the report. If this is the case, then does that mean specs aren't supposed to throw their own pending signals? I.e. is this incorrect?
(it "is actually a pending test"
(signal 'buttercup-pending "PENDING"))
(Obviously real-life usage would presumably be more complex, perhaps using it to skip a test that requires an optional dependency?)
Assuming this kind of pending test is not supported, and given that xit
sets the test status to pending
initially, it seems like it should be possible to skip running before-each
and after-each
forms for pending tests. If so, I think that fixes this issue for real.
Or to put it more succinctly, if pending specs will always have their status set to pending
before buttercup--run-spec
is called, then maybe buttercup--run-spec
should essentially be a no-op on pending specs.
An xit
form will replace the test body with a buttercup-pending
signal, so
(xit
(expect ...))
is the same as
(it
(signal `buttercup-pending "PENDING"))
likewise for empty it
forms.
The buttercup-pending
signals are then used to mark the tests as
pending (or skipped which use (signal 'buttercup-pending "SKIPPED")
to set that status). So the summary at the end of the test run use
that count instead.
There is also an assume
macro which expands to the buttercup-skip
function which also signals buttercup-pending
.
So I guess my point is that signaling buttercup-pending
(or other
symbols) is the correct way to set a spec as pending, but we do a
little hack at the beginning to provide pretty reports.
Ok, so it really isn't possible to know ahead of time which tests might signale buttercup-pending
, which means we really do need to run before-each
and after-each
forms for every test. Which means we do need the full logic of failed
> pending
> passed
.
There may be more to it than that, which is one of the reasons to handle it in a separate issue.
I'm a bit concerned about how assume
will fit with for-each
and
after-each
. If assume
is used to skip tests that depend on some
optional feature, we really shouldn't run other pieces of the tests
that may use the missing feature. I've yet to work up any realistic
examples, but it's something to think about.
Should we switch to discussing this in #157 then?
As you've probably noticed I have not merged any of the PRs yet, and that is because the more I think about it the more I think neither fixes the problem properly. The solution in #156 will also make solving some other things such as #161, #166, #164, and #158 hairier.
I'm leaning more and more towards not actually running the pending specs - including any before-each and after-each functions - at all. In the same vein, suites that only contain pending tests should not run their before-all and after-all functions.
The reason for this is that the setup and teardown can be costly, and actually be wrong if the spec is skipped.
I still have some doubts about what will happen when there are expect
calls inside after-each
or after-all
blocks. Any such expect
s will probably fail for skipped tests, and that will set the spec to failed instead of skipped. We've not seen any such report which I guess means users don't do this type of thing.
I agree that running nothing at all for pending specs/suites is the ideal solution. The problem is that in order to do that, we need to know whether a spec is pending without running it. We can certainly do this for xdescribe
and xit
. But for other tests, we can't determine the pending status of any other test until it runs, because any test is allowed to signal buttercup-pending
. And we can't run the test without running the before forms first, and we can't run the before forms without running the after forms as well to clean up after them.
As long as tests are allowed to mark themselves as pending/skipped at runtime, I can't think of a way to get around this. And that means that even if we skip everything for xit
and xdescribe
tests, we still need to handle the case where a test with before/after forms signals buttercup-pending
, including cases when one of those before/after forms signals an error or pending status as well.
Do you see anything I'm missing here?
I think we can do both, the ones marked as pending with xit
or skipped with the -p
option are not run. The rest must obviously be run, and it is OK if they signal pending. We still have to handle the situation where any after-function signals an error though.
I'll try to implement this and see how it goes.
Something I didn't mention in my last comment is that all the other things in the works will probably be harder to do with my reset-spec-state implementation.
Would it be useful to distinguish internally between tests skipped at declaration time (i.e. xit
etc.) vs. tests skipped at runtime (e.g. by calling buttercup-skip
)? I.e. have different statuses for each? They would still both be reported as pending tests, but internally it would be 2 distinct statuses.
I'm writing a test suite for ido. Using current buttercup, I get the following output (also visible on Travis here):
Buttercup correctly indicates at the top that several specs are not run, but at the bottom it just says "ran 21 specs", and every spec is output in green with timing info adjacent, which means I can't tell which specs were actually run.
Looking at the code for this commit, you can see that the last 3 tests are decalred with
xit
, which means they should be marked as PENDING.