Open davidsebfischer opened 3 years ago
Let's break this up into separate issues under a testing milestone, first step would be to carry over what we have into pytest i think.
As soon as all the tests pass (currently many fail) you can assign adding the testing CI to me.
Let's record which unit tests are good to go, by sub module:
Submodules which tests that require cached data that have to be deactivated in CI:
pytest --ignore=data in the CI should work
on this note: I think a data unit test that subsets our DatasetSupergroupSfaira to a single (smallish) dataset, downloads it and loads it would be super crucial to have in the CI.
Should be doable, right @Zethson ?
on this note: I think a data unit test that subsets our DatasetSupergroupSfaira to a single (smallish) dataset, downloads it and loads it would be super crucial to have in the CI.
Should be doable, right @Zethson ?
Depends on the size and whether the data is publicly available, but besides that of course.
yeah, can just pick a small, public one (order of megabytes).
for example this one: https://github.com/theislab/sfaira/blob/dev/sfaira/data/dataloaders/loaders/d10_1038_s41591_019_0468_5/human_lung_2019_dropseq_braga_001.py
I'll make this my fist unit test contribution in a long time. should we have a separate folder for this (as we wanted to exclude the tests in the data folder from the CI?)
should we have a separate folder for this (as we wanted to exclude the tests in the data folder from the CI?)
Sounds good, let s just collect all of these tests in a folder cached
or so?
so cached would be your tests excluded from CI, right?
I might have misunderstood this. Are you talking about data
or data_contribution
?
all I want to do is determine the correct folder (potentially a new one) where I can put a dataloading test that will be run by the CI.
Either keep them where they are or under a new folder cached
I would say.
looking through the existing tests in "data" they are a mix of tests that do and do not require local data. would it make sense to separate those out? ie. have one subdirectory of unit_tests (called "local") or something that contains all the tests which require local data (and hence should be excluded from CI?)
Let s just move all of these tests to whatever data set we decide to use in the CI? The current selection of data set used there is arbitrary.
so we're aiming to make all the data tests compatible with the CI now?
I'm not very well versed with pytest and / or CIs: I thought that each test is kind of run in it's own isolated environment, so we would have to download the dataset separately for every test? or is there a ways to share a file between different tests?
is there a ways to share a file between different tests?
@Zethson?
so we're aiming to make all the data tests compatible with the CI now?
Ideally yes.
I'm not very well versed with pytest and / or CIs: I thought that each test is kind of run in it's own isolated environment, so we would have to download the dataset separately for every test?
Although that's possible (keywords are tox
or nox
) we won't make use of that.
Or is there a ways to share a file between different tests?
Sure. The way the CI works it that it simply clones the whole repository and then does what we tell it to do (run pytest). Hence, if you have a folder "test_data" somewhere all tests will be able to access it.
Awesome, I am already using sfaira/unit_tests/test_data
so we can just keep that! It s also in the gitignore so nice to deal with locally.
Nice, is it acceptable / good practice that one test depends on other tests having run already? I.e. if I put the test for the download first, I would not have to reissue the download command in each of the other tests depending on that dataset.
Or would this be considered bad practise?
Nice, is it acceptable / good practice that one test depends on other tests having run already? I.e. if I put the test for the download first, I would not have to reissue the download command in each of the other tests depending on that dataset.
Or would this be considered bad practise?
I think that you are looking for pytest fixtures. Basically we can label tests that they need a certain setup before running. The fixture. Instead of me trying to explain this I would refer you to the documentation which certainly does a better job than I would.
@davidsebfischer this would be a nice solution for the data CI tests, what do you think?
I am fixing unit tests in the linked PR.
@davidsebfischer Ideally we also add a CI job to run the tests. Usually you differentiate between fast unit tests that may even mock objects that can be run everywhere. Those should also be run in a CI job. Contrary, integration tests which require data downloading etc should not be run in a CI environment.
I did not yet look at the tests, but expect there to be some integration tests. There surely is a pytest way to name them to exclude them from normal pytest runs.