Closed mittalpatel closed 4 years ago
bert large is bigger than bert base. You're using a batch size of 24 (which is big, especially with 12 gradient accumulation steps).
Reduce your batch size in order for your model + your tensors to fit on the GPU and you won't experience the same error!
Right @LysandreJik , reducing the batch size did fix the error but it looks like the generated model we receive is not same as provided by huggingface.
In our demo of closed domain QnA, https://demos.pragnakalp.com/bert-chatbot-demo, the answers are pretty good where we are using the model provided by huggingface (bert-large-uncased-whole-word-masking-finetuned-squad). But when we finetune on our own and even though we get 93.XX f1 score the accuracy of the model is not same as demo.
What other parameters were set by huggingface to generate "bert-large-uncased-whole-word-masking-finetuned-squad" model?
If the only difference between the command you used and the command available here is the batch size, you could try and adjust the gradient accumulation so that the resulting batch size is unchanged. For example if you put batch size equal to 6 (1/4 of the specified batch size, 24), you can multiply by 4 the gradient accumulation steps (-> 48) so that you keep the same batch size.
What exact_match
result did you obtain alongside the 93.xx F1 score?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We are trying the same command (except bert-base-cased, we are using bert-large-uncased-whole-word-masking) on 8x V100 GPU but getting CUDA out of memory error (CUDA out of memory. Tried to allocate 216.00 MiB....)
As per the https://github.com/huggingface/transformers/tree/master/examples it should work but it's giving error and stopping in the middle. Any tips would be appreciated.