pybamm-team / PyBaMM

Fast and flexible physics-based battery models in Python
https://www.pybamm.org/
BSD 3-Clause "New" or "Revised" License
1.12k stars 549 forks source link

Cannot pickle model when format is "python" #1283

Closed tlestang closed 3 years ago

tlestang commented 3 years ago

Description

Instances of BaseModel and derived classes cannot be pickled when attribute convert_to_format is "python". This hampers use of multiprocessing, since it relies on pickling objects for passing them between processes, see #1261 .

To reproduce

import pickle
import pybamm

model = pybamm.lithium_ion.SPMe()
geometry = model.default_geometry
param = model.default_parameter_values
param.process_model(model)
param.process_geometry(geometry)
mesh = pybamm.Mesh(geometry, model.default_submesh_types, model.default_var_pts)
# discretise model
disc = pybamm.Discretisation(mesh, model.default_spatial_methods)
disc.process_model(model)

model.convert_to_format = "python"
model.default_solver.set_up(model)

try:
    with open("model.pickle", mode="wb") as f:
    pickle.dump(model, f)
except pickle.PicklingError as e:
    print(e)
Can't pickle <function evaluate at 0x7f1c4c5d3280>: attribute lookup evaluate on pybamm.expression_tree.operations.evaluate failed

Details

Function base_solver.set_up does several things but in particular is creates the SolverCallable objects the solver relies on to evaluate the RHS, initial conditions and jacobian.

# base_solver.py

# Add the solver attributes
model.init_eval = init_eval # SolverCallable instance
model.rhs_eval = rhs_eval
model.algebraic_eval = algebraic_eval
model.jac_algebraic_eval = jac_algebraic
model.terminate_events_eval = terminate_events_eval
model.discontinuity_events_eval = discontinuity_events_eval

# Calculate initial conditions
model.y0 = init_eval(inputs)

When model.convert_to_format == "python", SolverCallable.__call__ is a wrapper around EvaluatorPython.evaluate() which evaluates the Python code generated in EvaluatorPython.__init__ to translate the expression tree. When an EvaluatorPython instance is created, a global function evaluate is defined by compiling its definition on the fly. This function is then pointed to by EvaluatorPython._evaluate.

According to the docs for Pickle, only functions defined at the top level of a module can be pickled and I think this is the reason why evaluate cannot be pickled.

Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in.

Note that evaluate is a global function and not a local function in EvaluatorPython.__init__, i.e print(self._evaluate) evaluates to <function evaluate at 0x7fa4fee843a0> and not <function EvaluatorPython.__init__.<locals>.evaluate 0x7fa4fee843a0>. However since evaluate is compiled on the fly, I don't think Pickle can find the name of the module it is defined in…

Minimal example:

import pickle

python_str = """def dummy_function():
    return "Hello"
self._method = dummy_function
"""

class myclass():
    def __init__(self):
    compiled_function = compile(python_str, "", "exec")
    exec(compiled_function)

inst = myclass()
with open("data.pickle", mode="w+b") as f:
    try:
    pickle.dump(inst, f)
    except pickle.PicklingError as e:
    print(e)
Can't pickle <function dummy_function at 0x7fc39202ec10>: attribute lookup dummy_function on __main__ failed

Models can be pickled fine when mode.convert_to_format == "casadi", since in this case SolverCallable.function calls an instance of Casadi.Function, which can pickled.

martinjrobins commented 3 years ago

couple of points on this:

  1. should the base solver be instead storing these "compiled" models in the solver itself? The user probably isn't expecting the solver to modify their model when you call set_up.
  2. there is no particular reason that the compiled evaluate needs to be pickled, can we just tell pickle not to include it? The EvaluatorPython class can instead store a string of the function to be compiled (I think it might already do this), and if the compiled version doesn't exist when EvaluatorPython.evaluate is called, compile it then.
valentinsulzer commented 3 years ago

The reason the compiled models are stored in the model and not the solver itself is that we allow the same solver to be used to solve different models, so if the compiled models are stored in the solver then solving a second model would overwrite the first one and require a new setup. As a solution we could either:

martinjrobins commented 3 years ago

Yup, agree that this functionality is useful. It would also be nice to be able to use multiple solvers on the same model. I've run into this a number of times, where I've wanted to swap between different solvers for the same model because they worked better/worse in different areas of parameter space. But I can't easily do this because I need to create a whole new model for each solver.

Your second option, enforcing only one model per solver, might be the safest approach. If we use a dictionary a user could potentially: (a) create a solver and set it up with a model, (b) modify the model in order to create what they believe to be a new model, then (c) setup the same solver with this "new" model. Do you think we need to worry about that?

valentinsulzer commented 3 years ago

For the second option, it depends how hashing works - if you change an attribute of a python object (or an attribute of an attribute of an attribute ...), does it change the hash?

martinjrobins commented 3 years ago

We would have to implement the hashing function for a pybamm.Model, using the __hash__ magic method (or the __repr__ one). So it depends on whether we can create a good hashing function for a model.

tlestang commented 3 years ago

Sorry I didn't notice #734 .

there is no particular reason that the compiled evaluate needs to be pickled, can we just tell pickle not to include it?

I skimmed through the documentation for Pickle and it looks to me like this would be possible. I'll read in more details and give it a try.

Otherwise I don't have (yet?) a strong opinion regarding whether compiled modes should be attributes of the solver or model instances. I'm not sure I understand why enforcing one model per solver is the best option if what we want to make/keep possible is solving several models with the same solver or use multiple solvers with one model. @martinjrobins why do you need to create a new model each time you change solver? Isn't solver.set_up independent of the solver type?

martinjrobins commented 3 years ago

At the moment, you can't use multiple solvers with a model because each solver stores its temporary files (i.e. compiled models) in that model. Plus you need to set the model.convert_to_format appropriately for the solver you want to use.

It would be nice to be able to do this:

model = create_and_discretise_model()
solvers = [pybamm.CasadiSolver(), pybamm.JaxSolver(), pybamm.ScipySolver()]
[solver.set_up(model) for solver in solvers]

or perhaps this,

model = create_and_discretise_model()
solver_classes = [pybamm.CasadiSolver, pybamm.JaxSolver, pybamm.ScipySolver]
solvers = [solver(model) for solver in solvers]

but as far as I can see you can't as each call to set_up will clobber the previous solver's compiled models (whether they are compiled via casadi, python or jax)

valentinsulzer commented 3 years ago

Fixed by #1298