Reproducibility & return N best models

maxpumperla / hyperas

Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

http://maxpumperla.com/hyperas/

MIT License

2.18k stars 318 forks source link

Reproducibility & return N best models #210

Open CharlyEmpereurmot opened 5 years ago

CharlyEmpereurmot commented 5 years ago

Hello, Thank you for this great piece of software :) Extremely useful.

I was wondering how to get fully reproducible results (or averaged/repeated results), maybe by:

Using fix seeds for python/numpy/tensorflow number generators + PYTHONHASHSEED (env var) like recommended here: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development. But I saw no recommendation about this in the README exemple.
Running each model with each set of parameters 10-50 times and averaging the results.

For case 2, how should I proceed ? Did I miss something about repeating model + params in the API ? (considering computing time is not an issue) Otherwise, could you please recommend another method.

On another note, how can we explore the N best models + sets of params instead of only the best performing model ?

Thank you very much

maxpumperla commented 5 years ago

@Charly-Empereur-mot thanks. so to ensure hyperopt reproducibility see here: https://github.com/hyperopt/hyperopt/issues/142 haven't checked if this seamlessly translates over to hyperas, but consider it a start ;)

Best, M

CharlyEmpereurmot commented 5 years ago

Thank you, I could obtain reproducible results. With CPU usage though (still working on GPU), which imposes the use of a single thread. Would you have information about the following points, please:

What about running each model with each set of parameters 10-50 times and averaging the results (for robustness), is there an argument to do so with the current API ? Otherwise, I believe it would be useful and I could help implementing this.
Also, how can we explore the N best models + sets of parameters instead of only the best performing model ? Idem, worth implementing.

Thank you very much!

maxpumperla commented 5 years ago

@Charly-Empereur-mot I don't quite understand what you mean by "each model". do you have a list of models that you want to optimize?

hyperas allows you to return all models for each run (beware: memory consumption). afterwards, you can do whatever you want.

also, have a look at our ensemble models, which I presume you want: https://github.com/maxpumperla/hyperas/blob/master/hyperas/ensemble.py

Can you do me a favour and do a PR with (or otherwise give me) your example with reproducible results? I think that could be very useful for other users as well.

xuzhang5788 commented 5 years ago

@Charly-Empereur-mot Could you please tell us how you obtained reproducible results? Many thanks

xuzhang5788 commented 5 years ago

@maxpumperla Refer to your ensemble models, I tried to use it with Jupyter notebook, but it doesn't work. Will you have a notebook version? Thanks