theislab / sfaira

data and model repository for single-cell data
https://sfaira.readthedocs.io
BSD 3-Clause "New" or "Revised" License
135 stars 11 forks source link

switch unit tests to pytest & add CI job for tests #106

Open davidsebfischer opened 3 years ago

Zethson commented 3 years ago

@davidsebfischer Ideally we also add a CI job to run the tests. Usually you differentiate between fast unit tests that may even mock objects that can be run everywhere. Those should also be run in a CI job. Contrary, integration tests which require data downloading etc should not be run in a CI environment.

I did not yet look at the tests, but expect there to be some integration tests. There surely is a pytest way to name them to exclude them from normal pytest runs.

davidsebfischer commented 3 years ago

Let's break this up into separate issues under a testing milestone, first step would be to carry over what we have into pytest i think.

Zethson commented 3 years ago

As soon as all the tests pass (currently many fail) you can assign adding the testing CI to me.

davidsebfischer commented 3 years ago

Let's record which unit tests are good to go, by sub module:

Submodules which tests that require cached data that have to be deactivated in CI:

Zethson commented 3 years ago

pytest --ignore=data in the CI should work

le-ander commented 3 years ago

on this note: I think a data unit test that subsets our DatasetSupergroupSfaira to a single (smallish) dataset, downloads it and loads it would be super crucial to have in the CI.

Should be doable, right @Zethson ?

Zethson commented 3 years ago

on this note: I think a data unit test that subsets our DatasetSupergroupSfaira to a single (smallish) dataset, downloads it and loads it would be super crucial to have in the CI.

Should be doable, right @Zethson ?

Depends on the size and whether the data is publicly available, but besides that of course.

le-ander commented 3 years ago

yeah, can just pick a small, public one (order of megabytes).

for example this one: https://github.com/theislab/sfaira/blob/dev/sfaira/data/dataloaders/loaders/d10_1038_s41591_019_0468_5/human_lung_2019_dropseq_braga_001.py

I'll make this my fist unit test contribution in a long time. should we have a separate folder for this (as we wanted to exclude the tests in the data folder from the CI?)

davidsebfischer commented 3 years ago

should we have a separate folder for this (as we wanted to exclude the tests in the data folder from the CI?)

Sounds good, let s just collect all of these tests in a folder cached or so?

le-ander commented 3 years ago

so cached would be your tests excluded from CI, right?

davidsebfischer commented 3 years ago

I might have misunderstood this. Are you talking about dataor data_contribution?

le-ander commented 3 years ago

all I want to do is determine the correct folder (potentially a new one) where I can put a dataloading test that will be run by the CI.

davidsebfischer commented 3 years ago

Either keep them where they are or under a new folder cachedI would say.

le-ander commented 3 years ago

looking through the existing tests in "data" they are a mix of tests that do and do not require local data. would it make sense to separate those out? ie. have one subdirectory of unit_tests (called "local") or something that contains all the tests which require local data (and hence should be excluded from CI?)

davidsebfischer commented 3 years ago

Let s just move all of these tests to whatever data set we decide to use in the CI? The current selection of data set used there is arbitrary.

le-ander commented 3 years ago

so we're aiming to make all the data tests compatible with the CI now?

I'm not very well versed with pytest and / or CIs: I thought that each test is kind of run in it's own isolated environment, so we would have to download the dataset separately for every test? or is there a ways to share a file between different tests?

davidsebfischer commented 3 years ago

is there a ways to share a file between different tests?

@Zethson?

Zethson commented 3 years ago

so we're aiming to make all the data tests compatible with the CI now?

Ideally yes.

I'm not very well versed with pytest and / or CIs: I thought that each test is kind of run in it's own isolated environment, so we would have to download the dataset separately for every test?

Although that's possible (keywords are tox or nox) we won't make use of that.

Or is there a ways to share a file between different tests?

Sure. The way the CI works it that it simply clones the whole repository and then does what we tell it to do (run pytest). Hence, if you have a folder "test_data" somewhere all tests will be able to access it.

davidsebfischer commented 3 years ago

Awesome, I am already using sfaira/unit_tests/test_data so we can just keep that! It s also in the gitignore so nice to deal with locally.

le-ander commented 3 years ago

Nice, is it acceptable / good practice that one test depends on other tests having run already? I.e. if I put the test for the download first, I would not have to reissue the download command in each of the other tests depending on that dataset.

Or would this be considered bad practise?

Zethson commented 3 years ago

Nice, is it acceptable / good practice that one test depends on other tests having run already? I.e. if I put the test for the download first, I would not have to reissue the download command in each of the other tests depending on that dataset.

Or would this be considered bad practise?

I think that you are looking for pytest fixtures. Basically we can label tests that they need a certain setup before running. The fixture. Instead of me trying to explain this I would refer you to the documentation which certainly does a better job than I would.

le-ander commented 3 years ago

@davidsebfischer this would be a nice solution for the data CI tests, what do you think?

davidsebfischer commented 3 years ago

I am fixing unit tests in the linked PR.