Open joshuacwnewton opened 3 years ago
One problem with this is that we're currently redownloading the test dataset on every pytest run. If we increase the size of sct_testing_data
, then redownloading starts to become a problem.
But, this is an issue we've already wanted to address, see: https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/2959.
One problem with this is that we're currently redownloading the test dataset on every pytest run. If we increase the size of sct_testing_data, then redownloading starts to become a problem.
This was fixed by #3480. :slightly_smiling_face:
Currently, SCT maintains 2 different datasets for testing
sct_example_data
: Full-sized images, used mainly forbatch_processing.sh
, but not actually available during regular testing.sct_testing_data
: A set of pre-processed images (cropped, resampled, and run through SCT tools e.g. segmentation), created to speed up testing. This is downloaded at the start of everypytest
run.The problem is, sometimes during testing, we really do want access to the full-sized images (see for example https://github.com/spinalcordtoolbox/spinalcordtoolbox/pull/3468#discussion_r671696767), and
sct_testing_data
starts to feel unrepresentative of real-world data.At that point, if we are including full-sized images in
sct_testing_data
, then is there still a benefit is there to keeping the two datasets separate? I'm wondering if it would just be easier to keep both types of data (raw, processed) in the same dataset, then use that dataset for bothbatch_processing.sh
and our test suite.If we do merge them, we would have to think about dataset structure. For example, maybe we could follow a BIDS-like approach and use a
derivatives
folder for the pre-processed images?