Closed vidyasiv closed 1 day ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Can we get a clarification on this?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Who can help?
@muellerzr @SunMarc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
cd examples/pytorch/question-answering/ python run_qa.py \ --model_name_or_path google-bert/bert-base-uncased \ --dataset_name squad \ --do_train \ --do_eval \ --per_device_train_batch_size 12 \ --learning_rate 3e-5 \ --num_train_epochs 2 \ --max_seq_length 384 \ --doc_stride 128 \ --output_dir /tmp/debug_squad/ \ --max_steps 50 \ --save_steps 5000
Expected behavior
Not sure what is expected, I see checkpoint saved twice(also see it on
v4.43.3
):When I go back to
transformers v4.40.2
I only see one save coming from thetrainer.save_model()
:Suspect the first saving model checkpoint is from https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa.py#L656 and second is the trainer.save_model(): https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa.py#L657
Can you clarify if something changed in the train() to ensure model checkpoint is now saved as part of it? Why did the behavior change and was this intentional?
cc: @jiminha, @emascare, @libinta, @regisss