nilearn / nistats

Modeling and statistical inference on fMRI data in Python
BSD 3-Clause "New" or "Revised" License
95 stars 55 forks source link

[Feature request] add possibility to save nistats models #389

Closed PeerHerholz closed 5 years ago

PeerHerholz commented 5 years ago

Ahoi hoi everyone,

I was just redoing some previous analyses that were done in nistats. While doing so, I thought about a potential new feature that could be added.

What would you like changed/added and why?

I would like to add the possibility to save nistats models as e.g. dict or json, because this would enhance reproducibility (as e.g., models could be shared, regenerated from file and rerun), furthermore increasing documentation.

What would be the benefit? Does the change make something easier to use?

Models could be saved for later inspection and evaluation, as well as shared. As nistats as already very straightforward and easy to use (thanks for that!), I don't think that this feature would make some things easier to use, but add a new layer/level of usage (e.g., generate models from files, rerun models, etc.).

Clarifies something?

No.

If it is a new feature, what is the benefit?

As outlined above: increase reproducibility and documentation through model inspection and evaluation, sharing, generation from file and rerunning.

It would be cool to hear your thoughts on this.

Best regards, Peer

kchawla-pi commented 5 years ago

Hi. Would Joblib's load and dump work for you? https://joblib.readthedocs.io/en/latest/persistence.html

PeerHerholz commented 5 years ago

Hi,

sorry for the delayed reply. Yeah, this looks like what I was thinking about. Is there any reason to not "simply" use json? In case you think it would be worth adding to nistats and no one else is on it, I would be happy to give it a try.

jeromedockes commented 5 years ago

Is there any reason to not "simply" use json?

it would not be practical to represent Nistats objects using only json or any text format. remember that they reference numpy arrays, Nifti images (the masks), and many other python objects. to store Nistats objects in a reusable way, one would have to decide how all this information should be organized and stored on disk. It could be useful but would require a bit of thought and work.

serializing them with joblib is a good solution for temporary storage, caching or sending them over a network. However it may be risky for long-term storage (for example you may have difficulties loading them again with future versions of Nistats). Therefore is may be worthwhile also storing the information you are interested in, for example computed contrasts, in standard formats such as .nii.

PeerHerholz commented 5 years ago

Oh I see, sorry for not thinking this through. You're completely right in that nistats objects are rather complex. Would hdf5 be a possible option re long-term storage/support? Sure, it won't overcome the problem re future nistats versions, but it could be worth checking out.

kchawla-pi commented 5 years ago

I am not sure that is a very common use case. We haven't received requests or feedback for this feature. Building model storage using HDFS into Nistats will also mean maintaining it for a while.

Unless there's a significant demand for the feature or growing consensus and demonstration that the feature will improve the science (for example reproducibility), that's a lot of resources to commit.

I don't expect us to implement this as an integral feature in the foreseeable future.

PeerHerholz commented 5 years ago

Ok, thank you very much your thoughts and inputs on this. If it's not a thing that a lot of folks are interested in/demanding, then the effort won't be worth it at this point.

Do you mind if I play around with it in an independent repo?

kchawla-pi commented 5 years ago

You don't even need to ask us what you do in your fork :smile: . Please go ahead.

PeerHerholz commented 5 years ago

True that, but I just wanted to be sure, as I don't want to start anything you, as the main developers, are completely against.

I'm closing this issue now. Thanks again for all the feedback, input and ideas.