mle-infrastructure / mle-toolbox

Lightweight Tool to Manage Distributed ML Experiments 🛠
https://mle-infrastructure.github.io/mle_toolbox/toolbox/
MIT License
2 stars 0 forks source link

Keep history of best param/performance in `HyperoptLogger` #52

Closed RobertTLange closed 2 years ago

RobertTLange commented 3 years ago

I want to have a log of the timeseries of best parameter config at each batch iteration and the corresponding performance. This has to be fit into the original pandas dataframe shape.

Currently, the x-axis of the hyper_log dataframe contains the stored variables, while the y-axis goes over the different evaluation runs. The easiest extension could be to add another dataframe (e.g. best_log) which has exactly the same columns and rows equal to the number of evaluated batches.

In save_log we would simply store a dict {"all_runs": self.opt_log, "best_runs": self.best_log}. This can then simply be reloaded as a dict of dicts (say d) and transformed into a df via pd.concat(d). See discussion here.

We need to then also update the subselect_hyper_log function to accompany the two different logs. Finally, I would also like to add a 1D plot showing the best performance of the batch interations.

RobertTLange commented 3 years ago

Have a look at different visualization plots in skopt for which make sense and which don't: https://scikit-optimize.github.io/stable/modules/plots.html

RobertTLange commented 2 years ago

Addressed in mle-hyperoptpackage.