microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14k stars 1.81k forks source link

Removing redundant string format in the final experiment log #5685

Open olive004 opened 1 year ago

olive004 commented 1 year ago

This is a small but very simple request.

In the final experiment summary JSON generated through the NNI WebUI, there are some fields that were originally dictionaries that have been reformatted into strings. This is a small but annoying detail and probably easy to fix.

Most notably, this happens for values in the entry 'finalMetricData', which contains the default metric for the trial. When more than just the default metric are being tracked however, for example when a dictionary of metrics is added at each intermediate and final metric recordings, the value of the 'finalMetricData' field may look something like this:

'"{\\"train_loss\\": 1.2782151699066162, \\"test_loss\\": 0.9486784338951111, \\"default\\": 0.5564953684806824}"'

when it should simply be

{'train_loss': '1.2782151699066162',
 'test_loss': '0.9486784338951111',
 'default': '0.5564953684806824'}

I've reformatted it with these simple two lines:

keys_values = log['trialMessage'][0]['finalMetricData'][0]['data'].replace('"', '').replace(': ', '').replace(', ', '').strip('{}').split('\\')

reformatted = {k: v for k, v in zip(keys_values[1::2], keys_values[2::2])}

It would be quite nice and save unnecessary reprocessing if this could just be a regular JSON dictionary and not a stringified dictionary :)

Reproducing this:

After downloading the experiment summary as a json, the following code would reproduce the above behavior (if the trial includes a multitude of metrics collected in a dict as opposed to just the default metric being recorded):

with open('path_to_experiment_json') as f:
    log = json.load(f)
print(log['trialMessage'][0]['finalMetricData'][0]['data'])
> '"{\\"train_loss\\": 1.2782151699066162, \\"test_loss\\": 0.9486784338951111, \\"default\\": 0.5564953684806824}"'

A similar thing goes for the field hyperParameters field in each trial message, which is also a stringified dictionary.

log['trialMessage'][0]['hyperParameters']
> ['{"parameter_id":0,"parameter_source":"algorithm","parameters":{"batch_size":64,"seed":2,"steps":5000,"n_batches":1000,"linear_out1":512,"linear_out2":128,"conv2d_ks":2,"conv2d_out_channels":1},"parameter_index":0}']