WIP: Callback system and progress bar

rlange2 commented 4 years ago

[ ] Closes #65
[ ] Tests added
[ ] Passes black . && flake8
[ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

The idea behind this PR is to establish a callback systems that allows using callback mechanisms, e.g. for a progress bar, runtime diagnostics... For now, this is a work in progress as it is not functioning at the moment. I would rather use this PR to stepwise implement this feature. However, I hope the code gives a good impression of what it is trying to accomplish. The basic concept is implemented in drivers.BaseSimulationDriver. self._start, self._prestep and self._stop (_finish might be more fitting) should later be used to get information about the current run (e.g. in the class ProgressBar but generally any callback mechanism that will be implemented). I think my biggest issue here is so 'link' these instance variables to any simulation that is running. This becomes clear when looking at drivers.ProgressBar. Currently, they don't serve any real purpose and I'm not even sure they can get the needed information since all of that is set up in drivers.XarraySimulationDriver. I'm also not sure how detailed the individual stages are defined since any run follows the following concept:

initialize all components
iteration over run_step and finalize_step
finalize

I don't intent to change the overall modelling logic so instead of having information about what stage is running, finished or waiting to run (as what I was trying to do with the steps dictionary in _update_bar()), we might also use step-data from ds_gby_steps(returned by _get_runtime_datasets() - though, these might not be helpful regarding initialize and finalize). To summarise, I think what I'm missing is a data type that I can use to retrieve information. I can imagine that it is already there and I'm simply not aware of.

Documentation is missing for now since I expect that this routine will change quite a lot in the future.

Lastly, some callables might be better suited to xsimlab.drivers but I'm afraid that would lead to circular imports.

This might take me some time, so I welcome any proposals and criticisms :)

benbovy commented 4 years ago

Thanks @rlange for opening this.

Some comments:

Let's first implement the callback system in one PR, and then add a progress bar built on top of that in another PR.
drivers.BaseSimulationDriver is not the right place to implement all the callback mechanism. It's better to separate the concerns and have a separate base class for callbacks. Actually, you could almost copy the entire callback.py module from dask, and just replace the dask events (i.e., start, start_state, pretask, etc) by xarray-simlab's simulation events. Then, we can implement later a progress bar as a class that inherits from Callback, just like dask's ProgressBar.
Information about which step is running, etc. is gathered in drivers.RuntimeContext
I think a good signature for all callback functions in our case would be:

def event(runtime_context, store):
    # ...

runtime_context and store should both be instances of some FrozenMapping class that needs to be implemented. Here, "frozen" basically means "read-only", as changing runtime data or simulation data from the callbacks is a bad idea. This link is a good basis for implementing a FrozenMapping wrapper around a mutable mapping.

Again, drivers.BaseSimulationDriver is not the best place for the actual call to the callback functions. A level down, i.e., Model.execute(..., callbacks=None) would be better. Then, you can propagate the callbacks from xr_accessor.SimlabAccessor.run(..., callbacks=None) (public API entry point for the xarray extension) to Model.execute(), via drivers.XarraySimulationDrivers.__init__(..., callbacks).

benbovy commented 4 years ago

Compared to Dask's tasks, in xarray-simlab we might end-up with many possible events resulting from a combination of

a given simulation stage
model vs. process level
pre vs. post.

There is thus 16 possible events. With some imagination I'm sure we can find relevant use cases for every one of them. That's certainly too much to hard code every event in the class Callback.

Alternatively, the signature for directly creating a Callback instance from callables may look like:

def func1(runtime, store):
    # ...

def func2(runtime, store):
    # ...

clb_funcs = {
    ('initialize', 'model', 'pre'): func1,
    ('run_step', 'process', 'post'): func2
}

clb = Callback(funcs=clb_funcs)

So events are specified using a ('stage', 'model_vs_process', 'pre_vs_post') tuple.

Subclassing Callback may look like (using a decorator):

class MyCallback(Callback):

    @event('initialize', 'model', 'pre')
    def meth1(self, runtime, store):
        # ...

    @event('run_step', 'process', 'post')
    def meth2(self, runtime, store):
        # ...

benbovy commented 4 years ago

If we use a decorator to specify events, we could even support:

@event('initialize', 'model', 'pre')
def func1(runtime, store):
    # ...

@event('run_step', 'process', 'post')
def func2(runtime, store):
    # ...

clb = Callback(funcs=[func1, func2])

xarray-contrib / xarray-simlab

WIP: Callback system and progress bar #93