binarybottle commented 11 years ago

Throughout the Mindboggle code base, I've included Examples in the documentation with lines like the following:

>>> path = os.environ['MINDBOGGLE_DATA']
>>> sulci_file = os.path.join(path, 'arno', 'features', 'sulci.vtk')

where MINDBOGGLE_DATA is an environment variable set according to the instructions in http://mindboggle.info/users/installation.html.

Is this reasonable, or is there a better way for users to try out functions?

I also created these examples with the goal of testing code within doc strings with sphinx, and of carefully unit testing everything, but haven't had time to do this.

satra commented 11 years ago

you could consider two options:

A mindboggle data package: pip install mindboggle-data and it will install the package together with necessary data elements
a mindboggle function: from mindboggle import get_test_data

The reason i would stay away from using the data for doctests depends on how large the datasets are. for doctests you can craft a dataset that's fairly light and included with mindboggle, but leave regression and other tests to unit-tests. you can decorate those to skip the longer tests if the larger test data are not available.

ohinds commented 11 years ago

+1 for (1) and satra's answer.

forrestbao commented 11 years ago

Actually I prefer Arno's old way. I think it's easy for users to point to where they store their data. If we use the other two options, would it be difficult for users to run MindBoggle pipeline on their own data? For example, do they need to define paths in mindboggle.get_test_data ?

satra commented 11 years ago

@forrestbao: users should be able to run minboggle independent of whether this test data exists or not or the environment variable exists or not. if mindboggle's core code is dependent on the environment variable, then that data is integral to mindboggle and should be distributed as a dependency (via a mindboggle-data package).

doctests should not be dependent on large data - you want those things to run very quickly and to demonstrate the point and potentially code coverage rather than use them to perform regression tests. in fact data used for doc-tests should be part of the mindboggle package i think.

forrestbao commented 11 years ago

Thank you for your explanation @satra. I see. This is for test data. Then I +1 for your (1) answer too.

binarybottle commented 11 years ago

in addition to the test data for running the examples, the data that users might need to run mindboggle include:

the DKT40 or DKT100 atlas
the freesurfer templates made from Mindboggle-101 data
a pickle file containing fundus likelihood depth/curvature distribution training data

1 might not be necessary if we mandate that users run the newest freesurfer with DKT40 labeling.

2 might not be necessary if we disable multi-atlas multi-registration-based labeling as an alternative to #1.

3 is currently necessary to run the compute_likelihood() function.

binarybottle commented 11 years ago

so the idea is that in addition to the mindboggle software being available as a github repository, it would also be available with the data in my previous comment as well as test data as part of a distributed package? and to make it available via pip install, what more would i need to do to the present code base?
when you say "doctest", this does not necessarily mean a test within a docstring that is run when executing all tests, does it? is it a good idea to have externally executable tests in the docstrings, to ensure that the documentation and examples are current? if so, how do you set up and run such a test?
i work best from examples. could someone please write a unit test that i can model all other unit tests after?

satra commented 11 years ago

single source of mindboggle a. mindboggle as is with any necessary data b. a separate mindboggle-data package containing data only

for an example see: http://nipy.sourceforge.net/nibabel/devel/data_pkg_design.html#data-package-design
doctests are both examples and tests

when you say "doctest", this does not necessarily mean a test within a docstring that is run when executing all tests, does it?

yes it does

is it a good idea to have externally executable tests in the docstrings, to ensure that the documentation and examples are current? if so, how do you set up and run such a test?

don't know what this means. but you should always do a test before building docs, to ensure that the docs are built on tests that pass.
regarding unit-tests there are plenty around: again, see nipy.

finallly:

once you open up the repo, you should setup continuous integration testing with travis, so that every pull-request is checked to see if it breaks any tests.
you should also setup regression testing with larger workflows that you don't want to run everyday but perhaps on a weekly basis

nipy / mindboggle

Data and doc string Examples #7

1 might not be necessary if we mandate that users run the newest freesurfer with DKT40 labeling.

2 might not be necessary if we disable multi-atlas multi-registration-based labeling as an alternative to #1.

3 is currently necessary to run the compute_likelihood() function.