ucl-pond / pySuStaIn

Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
MIT License
135 stars 64 forks source link

Allow for complete model reloading #42

Open sea-shunned opened 1 year ago

sea-shunned commented 1 year ago

Background

At present, there are some variables (e.g. ml_f_EM) that are not returned by run_sustain_algorithm but are saved in the pickle files. It doesn't make sense to me that running the model gives you different outputs from loading the pickle file that you get from running it. The run_sustain_algorithm should just output what gets saved (and should probably output it as a dict to avoid having to refer to the order in which things are output).

Further to this, when trying to load previous results from a pickle file, a lot of the same setup is still required as when running the model in the first place, which can result in a lot of unnecessary boilerplate. There is some code that indicates there was once a desire for this. If anybody knows why this wasn't pursued or if there is some obstacle I haven't noticed please let me know.

Solutions

1. Expand what's pickled and reconstruct model instance

A @classmethod to recreate the model instance from a pickle file, resulting in the same thing as if you were to run the method (with Z_vals etc. bundled in), would simplify the average workflow significantly, and would be a fairly simple change. The main issue with this is that it would probably break people's existing code, and would require the notebook(s) to be updated. It would, however, keep to the current concept (pickle the results/arrays).

2. Modify how pySuStaIn pickles and save/load model in its entirety

We could also just pickle/unpickle the model instance itself, following a few changes to enable this. The process should also fix #27 and #41, on top of addressing the above (and simplifying things a lot). I reckon it should be doable without too much bother. Further details/considerations can be found here. This would also be best-served by turning the arrays that are usually pickled into attributes, and so would lead to a lot of small changes through the code.

Practicalities

If core maintainers agree, I'll go ahead with it, but if not then I will leave it.

ayoung11 commented 1 year ago

It's a great idea, particularly the second part. Thanks a lot for offering to code it up, think it would be worth discussing before commiting too much time to it. Agree that it would be best if it forms part of a versioned update, together with new notebooks (we should also start archiving those so people can find examples for the version they're using). Let's try to set up the quarterly pySuStaIn developers meeting we talked about ages ago and discuss it there - I'll send round an email.

sea-shunned commented 1 year ago

Sounds like a plan, I'll keep an eye out for your email!