Closed kmaziarz closed 7 months ago
I notice you put all the tests in
test_models.py
. Perhaps it would make more sense to put them intest_cli.py
instead?
Would you put this new file in the same directory i.e. tests/reaction_prediction/inference
? Alternatively these could be placed in a new directory tests/cli
but then I assume the fixtures would have to live really high up in this directory hierarchy to be usable by both tests?
Would you put this new file in the same directory i.e.
tests/reaction_prediction/inference
? Alternatively these could be placed in a new directorytests/cli
but then I assume the fixtures would have to live really high up in this directory hierarchy to be usable by both tests?
Good question: I guess it CLI tests could be top level? But I agree this makes the fixtures harder to find. Feel free to leave it as it is.
I guess CLI tests could be top level? But I agree this makes the fixtures harder to find.
I realized the CLI tests do not share fixtures with the models tests; they only share the logic for skipping the test if the single-step models are not installed. I now extracted that logic to a separate utility, which allowed for splitting out the CLI tests into tests/cli/test_cli.py
.
So far our search CLI was not being tested in CI at all, while the single-step evaluation CLI was only covered with some unit tests using synthetic models/data. This made it quite easy to break the CLI, or break the model wrappers in a way that is evident only when e.g. using them end-to-end during search.
This PR adds full end-to-end tests for all of our CLI entry points, using all single-step models (apart from GLN), and - in the case of the search CLI - all main search algorithm classes. This increases coverage of
cli/{main, search}.py
from 0% to ~90%, and ofcli/eval_single_step.py
from 75% to 85%. Additionally, testing end-to-end search means added coverage of e.g. utilities for dumping routes into*.pdf
files, which were also not being tested. Overall, this PR improves test coverage from 81% to 91%.These new tests uncovered two issues related to returning probability values: one in Chemformer (introduced in recent refactoring, evident when running search), and one in RootAligned (introduced a long time ago, evident when running single-step evaluation with saving of results turned on); both of these are now also fixed as part of this PR. Finally, I also randomly uncovered a quirk of MEGAN, which causes it to fail if one sets
device="cpu"
despite GPU being available (this is because there are a lot oftorch.cuda.is_available()
calls in MEGAN code that "contradict" the fact that we've moved the model to CPU). For now, I just raise an explicit error in that case, as it's likely not a very useful scenario anyway.