msmbuilder / osprey

🦅Hyperparameter optimization for machine learning pipelines 🦅
http://msmbuilder.org/osprey
Apache License 2.0
74 stars 26 forks source link

Json dump suggestion #207

Closed jeiros closed 7 years ago

jeiros commented 7 years ago

When using osprey dump -o json I was wondering if it'd be more useful to store each of the parameters in separate name/value pairs, instead of in a single entry as a dictionary.

That way, when loading the json with pd.read_json for example, each of the parameters would be stored in a column. That feels more natural to me and allows for easier plotting: plt.scatter(df['tica__lag_time'], df['mean_test_score'])

At the moment, to 'extract' each of the parameters in a different DataFrame, I have to do something like this:

import pandas as pd
df = pd.read_json('dump.json')

params = pd.DataFrame(columns=list(df['parameters'][0].keys()))
for i, hyp_parms in enumerate(df['parameters']):
    params.loc[i] = hyp_parms

# Scatter plot with a single command
plt.scatter(params['tica__lag_time'], df['mean_test_score'])

If you think this is a good idea, maybe you could point me out what needs to be changed and I'd be happy to implement it.

cxhernandez commented 7 years ago

This sounds like a useful improvement to me! All the code for the json dump is contained here.

cxhernandez commented 7 years ago

done in #212