KeyError: 'eval_loss' when fine-tuning gpt-2 with run_clm.py

Potomac commented 4 years ago

Environment info

transformers version: 4.0.0-rc-1
Platform: Linux-4.19.0-12-amd64-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.6.0 (True)
Tensorflow version (GPU?): 2.2.0 (True)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: default option

Who can help

albert, bert, GPT2, XLM: @LysandreJik Trainer: @sgugger

Information

Model I am using (Bert, XLNet ...): GPT2

The problem arises when using:

[x] the official example scripts: (give details below) Bug occurs when running run_clm.py file from transformers/examples/language-modeling/ , the evaluation step (--do_eval) will crash with a python error related to missing KeyError 'eval_loss',

To reproduce

Steps to reproduce the behavior:

Use run_clm.py file from transformers/examples/language-modeling/
Try to fine-tune gpt-2 model, with your own train file and your own validation file
When you add "--do_eval" option in run_clm.py then an error will occur when the step "evaluation" is reached :

  File "run_clm.py", line 353, in <module>
    main()
  File "run_clm.py", line 333, in main
    perplexity = math.exp(eval_output["eval_loss"])
KeyError: 'eval_loss'

when I try to print the content of eval_output then there is just one key : "epoch"

the way I execute run_clm.py :

python run_clm.py \
    --model_name_or_path gpt2 \
    --train_file train.txt \
    --validation_file dev.txt \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --output_dir results/test-clm

Expected behavior

The evaluation step should run without problems.

sgugger commented 4 years ago

This is weird, as the script is tested for evaluation. What does your dev.txt file look like?

Potomac commented 4 years ago

Dev.txt contains text in english, one sentence by line. The PC I use has 2 graphic cards, so run_clm.py uses the 2 cards for the training, perhaps the bug occurs only when 2 or more graphic card are used for the training ?

sgugger commented 4 years ago

The script is tested on 2 GPUs as well as one. Are you sure this file contains enough text to have a least one batch during evaluation? This is the only thing I can think of for not having an eval_loss returned.

Potomac commented 4 years ago

The dev.txt file contains 46 lines, the train file contains 268263 lines.

the specifications of the PC I use :

Intel Xeon E5-2650 v4 (Broadwell, 2.20GHz)
128 Gb ram
2 x Nvidia GeForce GTX 1080 Ti

sgugger commented 4 years ago

Like I said, the dev file is maybe too short to provide at least one batch and return a loss. You should try with a longer dev file.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

huggingface / transformers