huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.78k stars 27.19k forks source link

Exception on saving results in official glue example scripts #20079

Closed li-plus closed 2 years ago

li-plus commented 2 years ago

System Info

Who can help?

@sgugger, @patil-suraj

Information

Tasks

Reproduction

I was running the official glue example script transformers/examples/pytorch/text-classification/run_glue_no_trainer.py on STS-B task.

export TASK_NAME=stsb
python run_glue_no_trainer.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --max_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir /tmp/$TASK_NAME/

The training went well, but on saving the results it raised the error below:

Configuration saved in /tmp/stsb/config.json
Model weights saved in /tmp/stsb/pytorch_model.bin
tokenizer config file saved in /tmp/stsb/tokenizer_config.json
Special tokens file saved in /tmp/stsb/special_tokens_map.json
Traceback (most recent call last):
  File "run_glue_no_trainer.py", line 633, in <module>
    main()
  File "run_glue_no_trainer.py", line 629, in main
    json.dump({"eval_accuracy": eval_metric["accuracy"]}, f)
KeyError: 'accuracy'

Expected behavior

Some of the glue tasks (STS-B, CoLA) don't use "accuracy" as metric. Maybe need to check the metric keys before accessing eval_metric.

https://github.com/huggingface/transformers/blob/504db92e7da010070c36e185332420a1d52c12b2/examples/pytorch/text-classification/run_glue_no_trainer.py#L627-L629

BTW, I have noticed that this block of code also appears in lots of other example scripts like multiple-choice, semantic-segmentation, etc. I'm not sure whether those scripts have the same issue.

sgugger commented 2 years ago

Yes, the whole eval_metric dict should probably be dumped without accessing keys. Do you want to open a PR with this change? cc @muellerzr who wrote this.

li-plus commented 2 years ago

Yeah, I'd like to help. The eval_metric should be dumped with all its keys prefixed by eval_, just like what run_glue.py does. https://github.com/huggingface/transformers/blob/504db92e7da010070c36e185332420a1d52c12b2/examples/pytorch/text-classification/run_glue.py#L573

I happen to find an example script that already fixed this issue by prefixing all keys in eval_metric before saving it. https://github.com/huggingface/transformers/blob/6cc06d17394f5715cdf2d13a1ef7680bedaee9e2/examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py#L66-L86

I will create a PR to migrate this solution to all remaining unfixed examples. Is it ok?

sgugger commented 2 years ago

That would be great, yeah!