xarray-contrib / xarray-simlab

Xarray extension and framework for computer model simulations
http://xarray-simlab.readthedocs.io
BSD 3-Clause "New" or "Revised" License
73 stars 9 forks source link

Progress bar implementation #65

Closed rlange2 closed 4 years ago

rlange2 commented 5 years ago

I wrote a process that indicates the progress of the simulation.

@xs.process()
class Progress:
    """Simple CLI progress bar based on runtime arguments.
    """

    count = xs.variable(description="time elapsed", intent = 'inout')
    time = xs.variable(description="current progress", intent = 'inout')
    stepper = xs.variable(description="progress marker", intent = 'inout')
    remainder = xs.variable(description="reverse stepper", intent = 'inout')

    @xs.runtime(args=['step_delta','sim_end'])
    def run_step(self, dt, sim_end):
        self.count += dt
        self.time = round((self.count/sim_end)*100, 2)
        self.stepper = "#"*int(self.time)
        self.remainder = "."*int(100-self.time)

    def finalize_step(self):
        print(f"Computing: [{self.stepper}{self.remainder}] {self.time}%", end = '\r')

    def finalize(self):
        print("\n")

I figured, by means of the runtime decorator, it is possible to extract information regarding the number of iterations and the end of the simulation. The output is a simple command-line version of a progress bar. Of course, that means the progress bar only includes the iterative portion of the simulation, namely run_step and finalize_step, and does neither include initialize nor finalize. While it seems acceptable to neglect the former (progress bar won't be displayed before run_step), the very last step of the model process may need additional time that is not considered here. I haven't noticed this in any of the given tutorials, however, currently I'm running a rather large model over a period of one billion years in steps of 20k years (whether that makes sense for the moment remains to be seen). This already takes quite long on my machine (~50 minutes) while finalize takes ~160 seconds. I feel that gives the wrong impression on the actual progress. Might consider to supply another output to finalize to make it clearer. Since this is a process, it's up to the user to include it into their model. Unfortunately, I don't understand the internals of xsimlabenough (yet) to come up with a proper implementation idea. My educated guess would be to write a separate progress bar class and pass it to class Model or as a property of class Model.

During all my tests, I haven't noticed the process contributing to overhead.

Edit: For the process, something close to the following might do the trick:

def finalize_step(self):
        if self.time < 100:
            print(f"Computing: [{self.stepper}{self.remainder}] {self.time}%", end = '\r')
        else:
            print(f"Computing: [{self.stepper}{self.remainder}] {self.time}%, Finalizing...", end = '\r')

def finalize(self):
    print("\nSimulation finished.\n")

Also, I'm curious if sys.stdout.write()would be the preferred method of choice in this context over print().

benbovy commented 5 years ago

Actually, I haven't thought about implementing some progress diagnostics simply as process classes. This is a great idea! It would be nice to show a simple example in the documentation.

That said, I think that it would be nice if we could provide another mechanism alongside for this purpose. Progress bars (and other runtime diagnostics such as profilers, logging, etc.) are rather independent of models, so it would make sense not having to include it explicitly in models.

A clean and common approach would be to first implement a callback system and plug it in driver classes (in drivers.py). A second step would be then to implement a progress bar.

We might get inspiration from those different sources:

benbovy commented 4 years ago

I think we can reuse much of Dask's implementation of callbacks here (see https://github.com/dask/dask/blob/7b4f90f0a708cbb97ebb498c78f7931d82ec99db/dask/callbacks.py), e.g.,

def progress(runtime_context, store):
    print(runtime_context['step'])

my_callback = xsimlab.Callback(pre_step=progress, post_step=None)

# option 1
dataset.xsimlab.run(model=model, callbacks=[my_callback])

# option 2
with xsimlab.add_callbacks(my_callback):
    out_ds1 = dataset.xsimlab.run(model=model)
    out_ds2 = dataset.xsimlab.run(model=model2)

# option 3
my_callback.register()

out_ds1 = dataset.xsimlab.run(model=model)
out_ds2 = dataset.xsimlab.run(model=model2)

my_callback.unregister()