Closed sjrl closed 1 year ago
Thanks for the report! I'll look into a solution for this today
@sjrl could you quickly verify that installing transformers
via pip install git+https://github.com/huggingface/transformers@fix-eval-accum-steps
solves this for you? Thanks!
Hey @muellerzr thanks for the quick fix! And my apologies I actually can't seem to reproduce the error on my end, but I did check that your change also works.
@muellerzr Sorry for disturbing you. I noticed this PR's change
- if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0:
+ if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:
breaks the behaviour of evaluation accumulation as described https://github.com/huggingface/transformers/pull/25819. And in the latest v4.33.1, it has been changed partially back to
- if args.eval_accumulation_steps is not None and self.accelerator.sync_gradients:
+ if args.eval_accumulation_steps is not None and (step + 1) % args.eval_accumulation_steps == 0 and self.accelerator.sync_gradients:
May I ask what is the purpose of introducing self.accelerator.sync_gradients
check in evaluation loop? In certain cases, this self.accelerator.sync_gradients
will be set False in training which prevent the accumulation in the evaluation.
System Info
transformers
version: 4.31.0.dev0Who can help?
Hey @sgugger, I'm tagging you since this has to do with the trainer.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Using the
run_qa.py
script in theexamples/pytorch/question-answering/
folderI found that the calculated metrics when using
eval_accumulation_steps
is not always correct. When not usingeval_accumulation_steps
with the above script I find that I get the expected metrics. However, I found that I needed to useeval_accumulation_steps
for evaluation of theflan-t5
models with the above parameters on my system otherwise the memory usage on the GPU would fluctuate from 4 - 8GB which could cause an OOM.I believe I found the cause for the inconsistency in the metrics. Specifically this line https://github.com/huggingface/transformers/blob/a074a5d34d6411fb00e83a2ed30acf23d8c976b5/src/transformers/trainer.py#L3150 does not cover the edge case where the total number of batches in the evaluation is not exactly divisible by
eval_accumulation_steps
. For example, ifeval_accumulation_steps = 2
and the total number of batches is 613, then only the last batch is used when calculatingall_preds
.I was able to partially fix this problem by adding a new variable called
total_steps
and updating the if statementHowever, this will still be a problem for dataloaders that don't have a defined length.
Expected behavior
Using
eval_accumulation_steps
should work in every case even when the number of batches is not divisible byeval_accumulation_steps
.