🪲 Unit tests do not appear to be broken while they are

boryanagoncharenko commented 1 month ago

Describe the bug The unit tests on main have been broken for 3 days but we only notice because they fail in a weblate PR. Running the tests locally also does not yield errors. However, only when we remove the test caching, the tests fail.

How to reproduce Checkout commit 25d3d4f123b6cfe5318a22abff968932ac8f661a, comment out the snippet_already_tested_with_current_hedy_version content and run test_level_15.py.

boryanagoncharenko commented 1 month ago

Commit 25d3d4f introduced multiple tests that check boolean values. These tests failed but we only noticed this when a Weblate PR failed.

Why didn't the PR author notice that the tests were broken? Probably because of test caching. As the PR author, I remember I wrote the tests and ensured they are all passing. As usual, I made changes to the code and ensured these changes are ok by running the full unit tests suite locally. My guess is that they worked because caching was enabled. Perhaps we want to consider whether test caching should happen locally? Additionally, it is a good idea to expand the cached source code of Hedy to include not only the grammars and hedy.py, but also hedy_translations, hedy_grammar, hedy_error files etc.
Why didn't the PR reviewer notice that the tests were broken? Usually, reviewers focus on other aspects of the PR and do not run the full unit test suite locally. In fact, we rely on GitHub to yield an error if the tests do not pass.
Why did the unit tests pass on GitHub both before and after the merge? It seems that the broken tests were not executed at all. The logs indicate that only 8394 tests were executed while the size of the test suite is 32005. Based on the validate_tests script, test_level, test_translation_level and test_snippets folders are skipped intentionally because they are slow. Note that the tests did get executed for the Weblate PR because the validate_tests script explicitly runs the full suite if the PR is from weblate. Perhaps we should consider re-enabling them for all PRs?

Felienne commented 1 month ago

Thanks for diving in!!

Commit 25d3d4f introduced multiple tests that check boolean values. These tests failed but we only noticed this when a Weblate PR failed.

Why didn't the PR author notice that the tests were broken? Probably because of test caching. As the PR author, I remember I wrote the tests and ensured they are all passing. As usual, I made changes to the code and ensured these changes are ok by running the full unit tests suite locally. My guess is that they worked because caching was enabled.

Yes that is a likely scenario, they first passed, then you made code changes, then they were not ran again.

Perhaps we want to consider whether test caching should happen locally? Additionally, it is a good idea to expand the cached source code of Hedy to include not only the grammars and hedy.py, but also hedy_translations, hedy_grammar, hedy_error files etc.

That would surely have caught the issue (locally)! Do you know how to move forward doing that?

Why didn't the PR reviewer notice that the tests were broken? Usually, reviewers focus on other aspects of the PR and do not run the full unit test suite locally. In fact, we rely on GitHub to yield an error if the tests do not pass.

Yes, we do and I think we should try as best as we can to avoid the need for manual checking.

Why did the unit tests pass on GitHub both before and after the merge? It seems that the broken tests were not executed at all. The logs indicate that only 8394 tests were executed while the size of the test suite is 32005. Based on the validate_tests script, test_level, test_translation_level and test_snippets folders are skipped intentionally because they are slow. Note that the tests did get executed for the Weblate PR because the validate_tests script explicitly runs the full suite if the PR is from weblate.

Perhaps we should consider re-enabling them for all PRs?

To test all snippets all the time I think would be very slow, that's why we do the caching in the first place. I am open to exploring other options but I think expanding the hashes is the best path forward?

boryanagoncharenko commented 3 weeks ago

As discussed on the contributor's meeting, with this task I will do the following:

add all hedy*.py files to the cache creation. Currently, we only add all grammar files along with hedy.py, and leave out hedy_translation.py, hedy_error.py, hedy_grammar.py, hedy_sourcemap.py.
investigate how caching works on GitHub and explore whether we can run all tests only when hedy*.py files were changed
document both local and GitHub caching on the wiki

boryanagoncharenko commented 3 weeks ago

Updated the Running unit tests section in the wiki

hedyorg / hedy

🪲 Unit tests do not appear to be broken while they are #5573