Exception on saving results in official glue example scripts

li-plus commented 2 years ago

System Info

transformers version: 4.25.0.dev0
Platform: Linux-4.14.81.bm.22-amd64-x86_64-with-glibc2.17
Python version: 3.8.13
Huggingface_hub version: 0.10.0
PyTorch version (GPU?): 1.12.1+cu116 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@sgugger, @patil-suraj

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I was running the official glue example script transformers/examples/pytorch/text-classification/run_glue_no_trainer.py on STS-B task.

export TASK_NAME=stsb
python run_glue_no_trainer.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --max_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir /tmp/$TASK_NAME/

The training went well, but on saving the results it raised the error below:

Configuration saved in /tmp/stsb/config.json
Model weights saved in /tmp/stsb/pytorch_model.bin
tokenizer config file saved in /tmp/stsb/tokenizer_config.json
Special tokens file saved in /tmp/stsb/special_tokens_map.json
Traceback (most recent call last):
  File "run_glue_no_trainer.py", line 633, in <module>
    main()
  File "run_glue_no_trainer.py", line 629, in main
    json.dump({"eval_accuracy": eval_metric["accuracy"]}, f)
KeyError: 'accuracy'

Expected behavior

Some of the glue tasks (STS-B, CoLA) don't use "accuracy" as metric. Maybe need to check the metric keys before accessing eval_metric.

https://github.com/huggingface/transformers/blob/504db92e7da010070c36e185332420a1d52c12b2/examples/pytorch/text-classification/run_glue_no_trainer.py#L627-L629

BTW, I have noticed that this block of code also appears in lots of other example scripts like multiple-choice, semantic-segmentation, etc. I'm not sure whether those scripts have the same issue.

sgugger commented 2 years ago

Yes, the whole eval_metric dict should probably be dumped without accessing keys. Do you want to open a PR with this change? cc @muellerzr who wrote this.

li-plus commented 2 years ago

Yeah, I'd like to help. The eval_metric should be dumped with all its keys prefixed by eval_, just like what run_glue.py does. https://github.com/huggingface/transformers/blob/504db92e7da010070c36e185332420a1d52c12b2/examples/pytorch/text-classification/run_glue.py#L573

I happen to find an example script that already fixed this issue by prefixing all keys in eval_metric before saving it. https://github.com/huggingface/transformers/blob/6cc06d17394f5715cdf2d13a1ef7680bedaee9e2/examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py#L66-L86

I will create a PR to migrate this solution to all remaining unfixed examples. Is it ok?

sgugger commented 2 years ago

That would be great, yeah!

huggingface / transformers