jbloomlab / polyclonal

Model mutational escape from polyclonal antibodies.
Other
3 stars 0 forks source link

somehow automating pickling of models #26

Closed jbloom closed 2 years ago

jbloom commented 2 years ago

@matsen and team, this is a Python question for you.

If you look in @timcyu's notebooks, he often has code like in cell [3] of this notebook. Essentially what he is doing is providing a manually built descriptive string that describes the model in that situation (e.g., how many variants, or what concentrations or whatever), fitting that model, then pickle-ing it and keeping track of what descriptive strings have already been fit so they don't have to be re-fit.

Is there a general way to automate this process into a function for the package? The issue of course is that @timcyu's strings don't actually fully describe the model: e.g., they don't describe the input data or other parameters associated with the model that aren't been varied, he's just manually included what happens to be important for this case.

To generically automate this, we'd somehow need to be able to key a dict or database with a hashable key that unique describes the entire model and associated data set. Then every time the user goes to fit a new model, the function would look up whether that hashable key corresponds to a model that has already been fit in some database.

Is this sort of thing possible?

matsen commented 2 years ago

Briefly (about to go get the girls):

You are generally talking about what is known as "memoization". I noted that you do it for your objective function too, providing a LossReg wrapper to avoid unnecessary calls.

There is some general functionality for this in Python. I know that @WSDeWitt has used the lru_cache decorator to good effect. I think you could use this by just making a free function that ingests all of the data needed and produces a Polyclonal object. If you want this to persist between sessions that would take more thought.