Simply run the default example script for text-classification with TASK_NAME=mnli.
Expected behavior
When evaluation starts, the first one is done on the standard validation MNLI dataset and the metrics are saved to eval_results.json and all_results.json as:
After that, evaluation with validation_mismatched happens and unfortunately it's using the same keys as the previous one (eval_accuracy, eval_loss...), which causes overwriting.
In the ideal case, it would be useful to have both metrics written in these json files.
Description
When running the https://github.com/neuralmagic/sparseml/blob/main/src/sparseml/transformers/text_classification.py example for the MNLI dataset, eval metrics calculated for the
validation_mismatched
dataset are overwriting the previously calculated metrics for the standard validation dataset.To reproduce
Simply run the default example script for text-classification with
TASK_NAME=mnli
.Expected behavior
When evaluation starts, the first one is done on the standard validation MNLI dataset and the metrics are saved to
eval_results.json
andall_results.json
as:After that, evaluation with
validation_mismatched
happens and unfortunately it's using the same keys as the previous one (eval_accuracy
,eval_loss
...), which causes overwriting. In the ideal case, it would be useful to have both metrics written in these json files.