[transformers] MNLI metrics overwritten

Description

When running the https://github.com/neuralmagic/sparseml/blob/main/src/sparseml/transformers/text_classification.py example for the MNLI dataset, eval metrics calculated for the validation_mismatched dataset are overwriting the previously calculated metrics for the standard validation dataset.

To reproduce

Simply run the default example script for text-classification with TASK_NAME=mnli.

Expected behavior

When evaluation starts, the first one is done on the standard validation MNLI dataset and the metrics are saved to eval_results.json and all_results.json as:

{
    "epoch": 3.0,
    "eval_accuracy": 0.8501830756712775,
    "eval_loss": 0.4692825376987457,
    "eval_runtime": 15.8047,
    "eval_samples": 9832,
    "eval_samples_per_second": 622.092,
    "eval_steps_per_second": 77.761
}

After that, evaluation with validation_mismatched happens and unfortunately it's using the same keys as the previous one (eval_accuracy, eval_loss...), which causes overwriting. In the ideal case, it would be useful to have both metrics written in these json files.

neuralmagic / sparseml