msdk / msdk

MSDK source code repository
Other
39 stars 38 forks source link

Reorganization of files for testing #85

Closed nilshoffmann closed 8 years ago

nilshoffmann commented 8 years ago

Hi, the current setup for testing has all relevant files local within the module providing support for that file format. Since we use Maven, these files are correctly placed in the src/test/resources folder. However, this means that other modules can not use these files for their own testing purposes with the standard configuration. It seems possible to do this by customizing the JAR plugin:

http://stackoverflow.com/questions/174560/sharing-test-code-in-maven

Additionally, the modules currently store all test files in the root of the class path. This may be a matter of taste, but would it seem sensible to use the module's artifactId as a subpath to the files? For example in the msdk-io-mztab module, the file Sample-2.3.mzTab would move to msdk-io-mztab/Sample-2.3.mzTab.

We could also think about adopting a mime-type scheme:

application/x-netcdf/testdata.cdf text/x-mztab/Sample-2.3.mzTab

tomas-pluskal commented 8 years ago

Hi Nils, To be honest, I like the current setup where each module has its own test data in src/test/resources/. It is simple and clear. If we put all test data into a single folder, after a few years we will end up with hundreds of data files in that folder and no easy way how to see which of them are used by which modules (or if used at all).

As for further organization of the test data, I am not against it, but I also don't see much advantage, especially since most modules now have only a few test files. If you wish to spend time on refactoring this, feel free to submit a pull request.

Personally, I would prefer to see some interesting Maltcms methods being ported to MSDK :)

dyrlund commented 8 years ago

I agree with Tomáš and like the simplicity of the current structure. But I also think that it is bad with the high redundancy we have in the test files. For example, orbitrap_300-600mz.mzML, is present 7 times. If we shared this file between the modules, we would also save about 8.5 MB of data.

Could we have a compromise where a set of standard files can be shared between the modules (e.g. one mzML file, one MS/MS file, one CSV, one mzTAB …). Specific test files will then still remain in each module.

tomas-pluskal commented 8 years ago

Thomas, 8.5 MB is nothing. If we talk about 8.5 GB, I would say it is worth saving :)

dyrlund commented 8 years ago

In MB it might not be much, but since the repository is only 115 MB, then it does account for 7.4% of the used space :)

nilshoffmann commented 8 years ago

I will come back to this later, when I will try to do some integration tests with large netcdf files from GCxGC-MS. I will probably host them externally.