Closed li-plus closed 2 years ago
Yes, the whole eval_metric
dict should probably be dumped without accessing keys. Do you want to open a PR with this change?
cc @muellerzr who wrote this.
Yeah, I'd like to help. The eval_metric
should be dumped with all its keys prefixed by eval_
, just like what run_glue.py
does.
https://github.com/huggingface/transformers/blob/504db92e7da010070c36e185332420a1d52c12b2/examples/pytorch/text-classification/run_glue.py#L573
I happen to find an example script that already fixed this issue by prefixing all keys in eval_metric
before saving it.
https://github.com/huggingface/transformers/blob/6cc06d17394f5715cdf2d13a1ef7680bedaee9e2/examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py#L66-L86
I will create a PR to migrate this solution to all remaining unfixed examples. Is it ok?
That would be great, yeah!
System Info
transformers
version: 4.25.0.dev0Who can help?
@sgugger, @patil-suraj
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I was running the official glue example script
transformers/examples/pytorch/text-classification/run_glue_no_trainer.py
on STS-B task.The training went well, but on saving the results it raised the error below:
Expected behavior
Some of the glue tasks (STS-B, CoLA) don't use "accuracy" as metric. Maybe need to check the metric keys before accessing
eval_metric
.https://github.com/huggingface/transformers/blob/504db92e7da010070c36e185332420a1d52c12b2/examples/pytorch/text-classification/run_glue_no_trainer.py#L627-L629
BTW, I have noticed that this block of code also appears in lots of other example scripts like multiple-choice, semantic-segmentation, etc. I'm not sure whether those scripts have the same issue.