How should testing be done?

sofroniewn commented 4 years ago

Following on from discussion in #1 on whether we want to maintain a library like this - a big question then is how should testing be done?

There are a variety of approaches:

imageio has a dedicated imageio/imageio-binaries repo with small files that it uses for testing.
meshio has a folder with small files inside it's tests folder in the main repo. I suspect those don't get put inside the actual package on pypi though.

If this repo starts depending on these libraries would we vendor these files for our own testing purposes? Sort of depends on how people feel about #1 and looking at the list of potential readers in #2 how we'd go about things.

tlambert03 commented 4 years ago

I've always wondered how the bioformats group did it. In the readme, they reference data tests "for internal developers only" (presumably some big folder of test files?)

for overlapping file format support, might be nice to have some asv benchmarks setup too.

sofroniewn commented 4 years ago

We can ask @joshmoore about how bioformats does testing, maybe they can even share with us a subset of the files they are allowed to make public. I'm also curious what his thoughts on #1 and #2 are and if he things we're biting off more than we can chew :-)

GenevieveBuckley commented 4 years ago

I think bioformats use these files hosted online for testing: https://samples.scif.io/

GenevieveBuckley commented 4 years ago

Here's how one third party library (pims) uses the images from https://samples.scif.io/ for testing:

https://github.com/soft-matter/pims/blob/master/download_bioformats_test.py
https://github.com/soft-matter/pims/blob/master/pims/tests/test_bioformats.py
...although it seems there is also a single .nd2 file hosted in their own repository for testing as well: https://github.com/soft-matter/pims/tree/master/pims/tests/data/bioformats

joshmoore commented 4 years ago

Hi folks. The files for which we've been given an open license to share are all available under https://downloads.openmicroscopy.org/images/. We're being more aggressive about requesting submitted files be public, but we're definitely not at 100% yet.

The combined public and private data is several terabytes backed by GPFS and we run all of it through Jenkins a couple times a day for testing new PRs, regressions, performance changes, etc. Originally, all the data under https://samples.scif.io/ was contained within the total set, but I haven't double checked that recently.

If the public data needs mirroring somewhere, please let me know. No need to wget through the web proxy. I can also provide a file listing with file sizes if that's useful. We can also go back and request that more historical data be made public, but that kind of footwork takes time.

As for #1 and #2, I'll do some pondering and get back to you. One first thought though: obviously, if you're doing it for yourself, then encapsulation makes sense and I don't see any objections. The problem will be if other libraries start depending on it. The pims example is a good one, see https://github.com/soft-matter/pims/issues/323#issuecomment-492370392. Eventually this kind of support can become a drain:

Although I'd like to see our effort being concentrated more on common formats a la https://forum.image.sc/t/next-generation-file-formats-for-bioimaging/31361, I understand that the Python community will need a solution sooner. I'd urge to share that burden around.

cc: @sbesson @melissalinkert @dgault

sofroniewn / napari-io-test

How should testing be done? #4