Closed rlange2 closed 4 years ago
Thanks @rlange for opening this.
Some comments:
Let's first implement the callback system in one PR, and then add a progress bar built on top of that in another PR.
drivers.BaseSimulationDriver
is not the right place to implement all the callback mechanism. It's better to separate the concerns and have a separate base class for callbacks. Actually, you could almost copy the entire callback.py
module from dask, and just replace the dask events (i.e., start
, start_state
, pretask
, etc) by xarray-simlab's simulation events. Then, we can implement later a progress bar as a class that inherits from Callback
, just like dask's ProgressBar.
Information about which step is running, etc. is gathered in drivers.RuntimeContext
I think a good signature for all callback functions in our case would be:
def event(runtime_context, store):
# ...
runtime_context
and store
should both be instances of some FrozenMapping
class that needs to be implemented. Here, "frozen" basically means "read-only", as changing runtime data or simulation data from the callbacks is a bad idea. This link is a good basis for implementing a FrozenMapping
wrapper around a mutable mapping.
drivers.BaseSimulationDriver
is not the best place for the actual call to the callback functions. A level down, i.e., Model.execute(..., callbacks=None)
would be better. Then, you can propagate the callbacks from xr_accessor.SimlabAccessor.run(..., callbacks=None)
(public API entry point for the xarray extension) to Model.execute()
, via drivers.XarraySimulationDrivers.__init__(..., callbacks)
.Compared to Dask's tasks, in xarray-simlab we might end-up with many possible events resulting from a combination of
There is thus 16 possible events. With some imagination I'm sure we can find relevant use cases for every one of them. That's certainly too much to hard code every event in the class Callback
.
Alternatively, the signature for directly creating a Callback
instance from callables may look like:
def func1(runtime, store):
# ...
def func2(runtime, store):
# ...
clb_funcs = {
('initialize', 'model', 'pre'): func1,
('run_step', 'process', 'post'): func2
}
clb = Callback(funcs=clb_funcs)
So events are specified using a ('stage', 'model_vs_process', 'pre_vs_post')
tuple.
Subclassing Callback
may look like (using a decorator):
class MyCallback(Callback):
@event('initialize', 'model', 'pre')
def meth1(self, runtime, store):
# ...
@event('run_step', 'process', 'post')
def meth2(self, runtime, store):
# ...
If we use a decorator to specify events, we could even support:
@event('initialize', 'model', 'pre')
def func1(runtime, store):
# ...
@event('run_step', 'process', 'post')
def func2(runtime, store):
# ...
clb = Callback(funcs=[func1, func2])
black . && flake8
whats-new.rst
for all changes andapi.rst
for new APIThe idea behind this PR is to establish a callback systems that allows using callback mechanisms, e.g. for a progress bar, runtime diagnostics... For now, this is a work in progress as it is not functioning at the moment. I would rather use this PR to stepwise implement this feature. However, I hope the code gives a good impression of what it is trying to accomplish. The basic concept is implemented in
drivers.BaseSimulationDriver
.self._start
,self._prestep
andself._stop
(_finish
might be more fitting) should later be used to get information about the current run (e.g. in the classProgressBar
but generally any callback mechanism that will be implemented). I think my biggest issue here is so 'link' these instance variables to any simulation that is running. This becomes clear when looking atdrivers.ProgressBar
. Currently, they don't serve any real purpose and I'm not even sure they can get the needed information since all of that is set up indrivers.XarraySimulationDriver
. I'm also not sure how detailed the individual stages are defined since any run follows the following concept:I don't intent to change the overall modelling logic so instead of having information about what stage is running, finished or waiting to run (as what I was trying to do with the
steps
dictionary in_update_bar()
), we might also use step-data fromds_gby_steps
(returned by_get_runtime_datasets()
- though, these might not be helpful regardinginitialize
andfinalize
). To summarise, I think what I'm missing is a data type that I can use to retrieve information. I can imagine that it is already there and I'm simply not aware of.Documentation is missing for now since I expect that this routine will change quite a lot in the future.
Lastly, some callables might be better suited to
xsimlab.drivers
but I'm afraid that would lead to circular imports.This might take me some time, so I welcome any proposals and criticisms :)