Open sea-shunned opened 1 year ago
It's a great idea, particularly the second part. Thanks a lot for offering to code it up, think it would be worth discussing before commiting too much time to it. Agree that it would be best if it forms part of a versioned update, together with new notebooks (we should also start archiving those so people can find examples for the version they're using). Let's try to set up the quarterly pySuStaIn developers meeting we talked about ages ago and discuss it there - I'll send round an email.
Sounds like a plan, I'll keep an eye out for your email!
Background
At present, there are some variables (e.g.
ml_f_EM
) that are not returned byrun_sustain_algorithm
but are saved in the pickle files. It doesn't make sense to me that running the model gives you different outputs from loading the pickle file that you get from running it. Therun_sustain_algorithm
should just output what gets saved (and should probably output it as adict
to avoid having to refer to the order in which things are output).Further to this, when trying to load previous results from a pickle file, a lot of the same setup is still required as when running the model in the first place, which can result in a lot of unnecessary boilerplate. There is some code that indicates there was once a desire for this. If anybody knows why this wasn't pursued or if there is some obstacle I haven't noticed please let me know.
Solutions
1. Expand what's pickled and reconstruct model instance
A
@classmethod
to recreate the model instance from a pickle file, resulting in the same thing as if you were to run the method (withZ_vals
etc. bundled in), would simplify the average workflow significantly, and would be a fairly simple change. The main issue with this is that it would probably break people's existing code, and would require the notebook(s) to be updated. It would, however, keep to the current concept (pickle the results/arrays).2. Modify how pySuStaIn pickles and save/load model in its entirety
We could also just pickle/unpickle the model instance itself, following a few changes to enable this. The process should also fix #27 and #41, on top of addressing the above (and simplifying things a lot). I reckon it should be doable without too much bother. Further details/considerations can be found here. This would also be best-served by turning the arrays that are usually pickled into attributes, and so would lead to a lot of small changes through the code.
Practicalities
If core maintainers agree, I'll go ahead with it, but if not then I will leave it.