Closed segef closed 3 years ago
Hello! What is your machine? When you run the script, at which point does it fail? Right off the bat, or after a few sequences have been processed?
I have tried it on my local GTX1650 and also on a 16gb T100. They both fail during processing the first sequence. It is not always at the same line but mostly during forward
of SelfAttention
module of the Bert
. I also decreased the input sizes while processing the data with the tokenizer. It manages to process one sequence but then it fails with OOM again while processing the second sequence. Additionally, I tried training it directly on colab, it fails with a OOM there, too.
Not sure how and why but the training started working on T100, even though I haven't really changed anything. The GPU might be just overloaded back then. I will close this issue.
Hi, I am trying to train a Bert2Bert model for text summarization. I followed the exact steps in BERT2BERT for CNN/Dailymail. Only things that I changed are the training arguments and metrics. Additionally I have also tried to replace seq2seq_trainer with Seq2SeqTrainer from the package itself, the result was the same. I am using
bert-base-uncased
model for BERT and CNN/Dailymail as dataset (just like it was introduced in the colab).Even with
batch_size=1
, I am getting the OOM. It seems like the cuda does not free any memory at all.versions of my
transformers
andtorch
are as followed.transformers 4.2.0, torch 1.7.1+cu110
Can you help me with this issue? What do you think the issue might be?