Closed JoshKarpel closed 5 years ago
I have also given this a lot of though on how i can check/test the code but I haven't come up with a good solution.
I am aware of the Mathematica testing framework but is seems a bit cumbersome in this case.
A correct install check i can definitely do. It is easy to check if all function with the correct options I expect from my side (i.e. the ones i have defined) are all available after install.
i will write such a function asap.
i have added a file and install check test and a test after loading to see if all functions are present.
and have added the tests to the demo.nb
to make it more convenient i have moved all demos and testing files to the DemoAndTest directory to make the download more converient.
That sounds good to me. Perhaps we can keep this issue open and see if one of the other reviewers (who knows about QMRI) can comment on what kind of other tests might be appropriate?
I think the demos when combined with the install check above will at least test that things are not completely broken. Users are likely to have more confidence in the package if tests can be extended further to test against various possible corner cases, etc. From what I understand the Mathematica API tends to have good backwards compatibilty across versions? If that is the case, then hopefully the existing functionality will continue to work without substantial modification as new Mathematica releases are made.
Due to the wide variety of functions in this toolkit, I think adding a comprehensive set of unit tests would be a large amount of work. I am also not familiar with how easy/mature the tools for this in Mathematica are to work with. A more limited approach would be to just add specific unit tests corresponding to bugs as they are encountered to prevent the same problems from occurring again in the future.
I assume that QMITools has been developed during use on real-world data, but if this was human data it may not be possible to make it publicly available. As I'm sure you are aware, there are commercially available MRI phantoms for validating things such as fat fraction (Dixon imaging) or with calibrated T1 or T2 values for relaxometry, but these are expensive and require access to MRI scanner time to obtain the data. Due to the above, as well as the potentially large size of real-world data, synthetic test cases are probably more feasible, but can take quite a bit of time/work to implement.
Another option would be to reuse existing test data or adapt test cases from other appropriately licensed software packages. For example, I know that there are pretty extensive tests related to diffusion MRI implemented as part of the DIPY Python package. Reduced data sets (e.g. 3x3x56) are used in some DTI tests.
@grlee @JoshKarpel Mathematica has excellent backward compatibility. I myself only work in the the latest version and as stated i will only provide support in the latest Mathematica version. In the 8 year i have been working on this I never had any issue upgrading to the latest version.
Indeed all my functions are tested on real life data and there are many qMRI phantoms. however phantom data servers the purpose of test the entire MRI setup, machine, sequence, acquisition etc etc. and not only the code used for processing. Even getting phantom data result the same between center using the same setup and software remains a challenge in most MRI applications.
For most complex applications i have also implemented simulation frameworks. For example DTI/DKI and EPG-T2 fitting, phase unwrapping. One example of the DTI simulations and the ability of the fitting to provide the true values is given in the demo. In theory i could write some very simple fit test on some simulated data but i feel that is a bit cheating since the code generating the signals is easy to fit while real life data is not. And the robustness of most methods rely not only on noise, but also artifacts motion etc.
In the demos file i provide some of those examples, for phase unwrapping, motion correction, EPG signal simulation and DTI fitting. If you want i could make them a bit more elaborate and clear.
I think the demo you have is good and tests a substantial portion of the more complicated functions. This kind of integration testing certainly has value and as you correctly point out is a better indicator than unit tests of the performance on realistic data. I also agree with your points on validation via phantoms being complicated. Ultimately the end user should always validate the combination of software + scanner pulse sequence at their site as there are substantial differences among vendors in underlying sequence implementation and the types of pre-processing/corrections applied by the system.
I am not sure offhand roughly what percentage of the code base is covered under this demo, but perhaps you could just fill in some holes for a few things that don't seem to currently be covered (CardiacTools, CoilTools, PhysioTools, etc? unless I missed something).
@grlee77 will do, i will cover more in the demo
@grlee77 I have added more examples to the data. it now covers around 60% of all functions. However many functions i only use internally in other functions and rarely used by a user. The demos touches around 80% of my code.
The functions covered in the demo are the 99% ones that will be needed in daily use by a normal user. There are many functions that i cannot give demos of since the cover very specific experimental data that is difficult to generalize (mostly Coiltools and PhysioTools).
i will try to cover that in the new paper.md that i am preparing.
@JoshKarpel, in the JOSS review you asked "when you get a chance - any ideas for automated testing?"
Did you mean via a continuous integration service? I guess the need for a Mathematica license makes this a little more difficult than for the Python projects I have experience with. @mfoeling could potentially set something up to run on a machine with access to a license using a service like Jenkins, but I don't have any direct experience with it or know how much effort it is to set up. This would probably become something that is worth spending time on if the team of contributors grows, but may not be worth it at the moment.
I found the following blog post discussing this related to CI for Matlab which has the same type of commercial license requirements: http://uwethuemmel.com/continuous-integration-workflow-with-matlab-git-and-jenkins/
Other libraries requiring GPU hardware that is not commonly on the free CI services also use this type of approach (e.g. CuPy).
@grlee77 A CI service would be nice, but as you mention, needing a Mathematica license makes that pretty non-trivial. I don't think it's necessary for now.
I think I meant "automated" in the sense of "I hit run on this notebook, sit back, and it tells me whether the package works right without me needing to interpret the results by hand". The "demo as integration testing" point noted above is quite reasonable, especially because @mfroeling has increased the demo coverage.
To make sure that one can check intended behaviour of the three main proccessing algorithms (DTI, EPG-T2, and Dixon) i have now added proper simulations for all three cases. For that i have extended the T2-EPG simulation framework and added a Dixon (Gradient Echo) simulator.
In the demo i show how people can use these simulations to test the fitting with respect to important parameters such as B1, B0, SNR, fat fraction etc.
Since validation and investigation of limitations of these methods in clinical and multi-center studies are actually ongoing projects im currently working on, I currently cannot provide more code for evaluation of these specific methods since they are WIP, but the will be updated when completed.
However feel free to post any suggestions on how i can further improve testebility of such functions.
I'm happy on this issue if @grlee77 is happy.
Yes. Thanks for expanding the demos @mfroeling. Based on the description based in JOSS's review criteria pasted below, I would say this falls under the OK category.
Good: An automated test suite hooked up to an external service such as Travis-CI or similar
OK: Documented manual steps that can be followed to objectively check the expected functionality of the software (e.g. a sample input file to assert behaviour)
Bad (not acceptable): No way for you the reviewer to objectively assess whether the software works
I am happy with this as well.
I just want to add a few references to some useful information about using the Mathematica testing framework:
@rljacobson i will look into this. It look as an easy thing to implement. Based on the my initial experience with the testing framework during this review i will definately see how to integrate this more. I keep the issue open for now and see if i can enhance based on the articles you referenced.
@rljacobson looked into the references you provide, very usefull!! For now i am keeping what i have but this functionality i will definitely keep in mind for upcoming changes.
@rljacobson @grlee77 @JoshKarpel since you are all happy with this i will close the issue. Feel free to open a new one if needed.
I haven't the faintest idea how testable this kind of code is, but at the very least having some basic sanity checks ("has it been installed correctly?" type checks) would be nice.
It looks like Mathematica does have a built-in testing framework: https://reference.wolfram.com/language/tutorial/UsingTheTestingFramework.html
Perhaps a "tests" notebook could be supplied along the with the "demo" notebook?
(Part of https://github.com/openjournals/joss-reviews/issues/1204)