Closed DavidFarago closed 1 month ago
I believe the above training_statistics.json
produced by training models with model_type
being ecd
is not meant for compare_performance()
in visualize.py
:
1) The argument test_stats_per_model
becomes [{'evaluation_frequency': {'frequency': 1, 'period': 'epoch'}, 'test': {'combined': {'loss': [6.1890668869018555]}, 'output': {'accuracy': [0.780337929725647], 'accuracy_micro': [0.9599999785423279], 'loss': [6.1890668869018555], 'roc_auc': [0.8079876899719238]}}, 'training': {'combined': {'loss': [24.924232482910156]}, 'output': {'accuracy': [0.7352941036224365], 'accuracy_micro': [0.9324324131011963], 'loss': [24.924232482910156], 'roc_auc': [0.7261029481887817]}}, 'validation': {}}, {'evaluation_frequency': {'frequency': 1, 'period': 'epoch'}, 'test': {'combined': {'loss': [6.1890668869018555]}, 'output': {'accuracy': [0.780337929725647], 'accuracy_micro': [0.9599999785423279], 'loss': [6.1890668869018555], 'roc_auc': [0.8079876899719238]}}, 'training': {'combined': {'loss': [24.924232482910156]}, 'output': {'accuracy': [0.7352941036224365], 'accuracy_micro': [0.9324324131011963], 'loss': [24.924232482910156], 'roc_auc': [0.7261029481887817]}}, 'validation': {}}]
2) convert_to_list(test_stats_per_model)
yields the same structure, so test_stats_per_model_list == test_stats_per_model
3) thus output_feature_names
becomes {'training', 'evaluation_frequency', 'test', 'validation'}
4) metric_names
becomes {'output', 'combined'}
5) thus metric_names.remove(LOSS)
on line 1497 causes KeyError: 'loss'
. If you instead do metric_names.discard(LOSS)
:
6) metrics_dict
will not map metric_names
to lists of metric values, but to lists of dicts containing loss
and possibly further keys :{'output': [{'accuracy': [0.7352941036224365], 'accuracy_micro': [0.9324324131011963], 'loss': [24.924232482910156], 'roc_auc': [0.7261029481887817]}, {'accuracy': [0.7352941036224365], 'accuracy_micro': [0.9324324131011963], 'loss': [24.924232482910156], 'roc_auc': [0.7261029481887817]}], 'combined': [{'loss': [24.924232482910156]}, {'loss': [24.924232482910156]}]}
7) Taking the minimum over such a list of dicts on line 1517 causes TypeError: '<' not supported between instances of 'dict' and 'dict'
.
Before fixing line 1497 and line 1517, it would be very helpful to have a JSON schema for the files passed as --test_statistics
arguments, or a definition of the structure of visualize.py
's first argument test_stats_per_model
.
Right. Instead of train_statistics.json, test_statistics.json
should be passed in. This file is generated by either model.eval or model.experiment. This should fix the key error.
Describe the bug Having trained two (mistral) models (with
model_type
eitherllm
orecd
), I wanted to create acompare_performance
visualization vialudwig visualize --visualization compare_performance --test_statistics dir1/training_statistics.json dir2/training_statistics.json
, but I getTo Reproduce Create two
training_statistics.json
in the style ofThen the visualization call above will lead to
metric_names == {"frequency", "perios"}
on line 1497 ofvisualize.py
, thus causing a KeyError formetric_names.remove(LOSS)
.Details, including a yaml that lead to above
training_statistics.json
can be found at https://ludwig-ai.slack.com/archives/C01PN6M2RSM/p1699559557928909?thread_ts=1698743347.409099&cid=C01PN6M2RSMExpected behavior No error, but a pdf being created with visualizations of the model comparison.
Environment (please complete the following information):
Additional context The environment is a runpod: