Closed micheletufano closed 3 years ago
Hi,
It seems I couldn't reproduce your error.
I guess it related to multi-GPU training. Did you use multi-GPU training with DataParallel instead of DistributedDataParallel? It's recommended to run eval_bleu
on a single GPU. It would cause errors of BLEU score when evaluating during multi-GPU training.
I have committed a change for this to only calculate ppl when evaluating during training. You can evaluate BLEU scores after the training finished.
Hi,
Yes, I was training with multi-GPU. I pulled the latest version and your change fixes the issues. Thanks for the quick reply and fix. I'm closing the issue.
Hi @celbree,
I'm reopening this because the issue is still present during evaluation (not training), even on a single GPU.
If I run the evaluation script, with export CUDA_VISIBLE_DEVICES=0
, I still get this error:
Traceback (most recent call last):
File "run.py", line 655, in <module>
main()
File "run.py", line 646, in main
dev_bleu, dev_EM = eval_bleu(args, model, tokenizer, file_type='dev', num=2000)
File "run.py", line 380, in eval_bleu
past_hidden = [x[:, i:i+1].expand(-1, beam_size, -1, -1, -1) for x in outputs]
File "run.py", line 380, in <listcomp>
past_hidden = [x[:, i:i+1].expand(-1, beam_size, -1, -1, -1) for x in outputs]
TypeError: tuple indices must be integers or slices, not tuple
The nvidia-smi
confirms that only 1 GPU was running for that evaluation.
I'm running on Tesla V100.
Would you please provide your version number of python, pytorch and transformers?
python: 3.6.9 torch: 1.8.0+cu111 transformers: 4.3.2
It seems in transformers 4.3.2, the format of outputs in GPT-2 model has been changed. You could try downgrade it to 3.3.0 and it should work.
Hello, (cc @celbree)
When I try to replicate the results for
Text-Code/text-to-code
, I get an error on the evaluation step.File:
CodeXGLUE/Text-Code/text-to-code/code/run.py