Skip unsupported regression tests

schuessf commented 1 year ago

Currently many regression tests still fail (see #611). Therefore the Jenkins status ("unstable") is not a good indicator. However, some of the tests are expected to fail (and always have), because they only pass for some settings/toolchains (e.g. due to overapproximation).

This PR provides the possibility to mark those tests as skipped after running them. Therefore we store all tests (consisting of file, settings, toolchain) along with a verdict in a separate file. If a tests fails with this verdict, it is marked as skipped, otherwise as failed.

There are still some open points to discuss:

Currently the files containing the verdicts need to be in the same folder as the settings or toolchains (and not the file itself!) to parse the skipped-file beforehand.
In the current version the verdicts are the results as strings (e.g. TIMEOUT, UNKNOWN, EXCEPTION_OR_ERROR,...). This has two possible issues. First we need to convert these Strings to enums, which might fail. Second these categories might not be as precise as wanted. Therefore we could use the description (e.g. Unable to prove ... Reason: overapproximation of ...) as verdict in this skipped-file.
And of course, if it is reasonable to mark the given tests as skipped.

danieldietsch commented 1 year ago

When you run a test suite with skipped tests, can you infer from the test suite result alone which tests where skipped? For example by looking into the TEST-*.xml?

danieldietsch commented 1 year ago

@schuessf Is there a reason why this isnt merged?

schuessf commented 3 months ago

Okay, I revisited this PR after a long time 🙃 and updated a few files. These are the downsides and open questions:

What should be name of the files that specify tests that are skipped (currently: .skip)? Should it be a fixed name or do we want to allow some pattern?
Currently the files have to be in the same folder as the settings or toolchains, but not the file itself. This allows to parse the file only once beforehand and not for every test iself.
The verdict in the .skip-files can currently only be the result (e.g. UNKNOWN, TIMEOUT, ...), but not a substring of the more detailed error message. This would allow us a more precise check, whether the test should be indeed marked as skipped (or if an unexpected error occured).

If we can agree on these questions, this should be ready to be merged -- and finally all our tests might pass 😉

danieldietsch commented 3 months ago

I would say:

Keep .skip, do not allow different names. Easier to find with find and the like.
Keep it in the same folder as settings or toolchains. Just one concept, would be more complex if we had different ones. And parsing once is of course also important :)
It would be nice to allow regexes here; then we could also exclude all results with .*.

schuessf commented 3 months ago

I updated the code to use regular expressions that also match on the message and not only the result (7b67a7e), without changing the .skip files yet. If we also want to use regular expressions in the .skip files, I think the current space-separated format is not optimal (since there might be also spaces inside the regexes). Maybe we can change the format also to be more "category-based", to avoid duplicating the same regexes.

danieldietsch commented 3 months ago

Yeah, using them in the .skip files would be needed. Category-based would also be nice. Perhaps use YAML or something for which we already have a parser?

danieldietsch commented 2 months ago

Just to bump, not that it lays dormant for too long :)

schuessf commented 2 months ago

Just to bump, not that it lays dormant for too long :)

There are some conflicts now. I will try to resolve them next week and finalize this PR 😉

schuessf commented 2 months ago

I just discovered some bugs and restarted the nightly. I think, this PR should be ready now.

Just one thing I recently discovered: All tests that are marked as skipped, occur also as "success" in Jenkins (and maybe even in JUnit?), e.g. see here and here. @maul-esel do you know why this is happening?

maul-esel commented 2 months ago

I do not know. When running tests in Eclipse, it looks fine. But I get the same phenomenon when running tests via the maven command line:

Tests run: 3, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 27.804 s - in de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite
I_my-test_regression_sum_invariant.bpl S_my-test_regression_ReachSafety-32bit-Automizer.epf T_my-test_regression_ReachSafety.xml(de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite) skipped
I_my-test_regression_sum_invariant.bpl S_my-test_regression_ReachSafety-32bit-Automizer.epf T_my-test_regression_ReachSafety.xml(de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite)  Time elapsed: 26.044 s
I_my-test_regression_sum_invariant.bpl S_my-test_regression_ReachSafety-32bit-GemCutter.epf T_my-test_regression_ReachSafety.xml(de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite)  Time elapsed: 1.751 s

(the third line should not be there)

ultimate-pa / ultimate

Skip unsupported regression tests #615