michelbierlaire / biogeme

Biogeme is an open source freeware designed for the maximum likelihood estimation of parametric models in general, with a special emphasis on discrete choice models.

Other

106 stars 51 forks source link

Callback system to the optimisation #21

Open tcapelle opened 1 year ago

tcapelle commented 1 year ago

Hello Michel,

I am curious how easy it would be to add a callback system to the optimization in biogeme. The idea would be to be able to add experiment tracking like W&B, so we can keep track of the different experiments in a central place. Basically, you would need to be able to call:

At the start, log(config) with the different initial hyperparams
log(metric) at the end of each iteration/step, so we get nice training curves and visual metrics on the workspace.
After finishing, store the validation results or plots that are relevant.

PD: I am trying to convert my friend Ricardo, so he stops staring at the terminal logs...

michelbierlaire commented 5 months ago

Sorry. I did not follow up back then. Are you still interested in this?

tcapelle commented 4 months ago

I am! I need this to convert Ricardo =P. The idea is having a dashboard with the metrics and inputs and outputs of the experiments that looks like this:

I am happy to discuss this over a call if you want =)

tcapelle commented 4 months ago

For

/docs/examples/latentbis > python plot_m01_latent_variable.py

Workspace view

Config and overview

tcapelle commented 4 months ago

I would need to raise PRs on both biogeme + biogeme_optimization

michelbierlaire commented 4 months ago

Well, I need to understand better what you would need. Now, the object that gathers and process the estimation results, as well as the reports about the iterations, is "bioResults". Would it make sense to enrich this object to contain additional data about the running of the algorithm?

tcapelle commented 4 months ago

There are 2 things:

If the optimizer would expose a logger, we could stream the metrics as the model fits. This is mostly done for "longer" fitting runs (deep learning), I don't know if your experiments sometimes are longer than a couple of seconds.
Indeed, we could bump bioResults to have a serializable method so we can dump this to W&B.

michelbierlaire commented 4 months ago

Most functions have a logger. They start with statements such as logger = logging.getLogger(__name__). See simple_bounds.py for instance. If you identify the quantities that are missing in the logger, and in bioResults, it should be easy to add them.

tcapelle commented 4 months ago

Thanks, yeah that's what I actually did for that example, I redifined logmessage:

       def logmessage() -> None:
            """Send messages to the logger"""
            values_to_report = [k]
            if variable_names is not None:
                values_to_report += list(iterate)
            values_to_report += [
                current_function.function,
                the_function.relative_gradient_norm,
                float(radius),
                rho,
                status,
            ]
            import wandb
            wandb.log({"current_function": current_function.function,
                       "relgrad": the_function.relative_gradient_norm,
                       "radius": float(radius),
                       "rho": rho})
            logger.info(the_formatter.formatted_row(values_to_report))

To stream the metrics real time, we need to call wandb.log when the metric is computed, in this case, the fitting metrics. If values_to_report where a dictionary, the logging would be extremely simple. wandb.log(values_to_report)