Resource exhausted - Githubissues

xuyige / BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Apache License 2.0

608 stars 99 forks source link

Resource exhausted #12

Open rajae-Bens opened 3 years ago

rajae-Bens commented 3 years ago

Hi,

first, thank u for having sharing ur cod with us

I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted can someone tell me how to fix this

Also what are the expected output of this further pretraining Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?

Thank u

chen3082 commented 3 years ago

Hey men, I encounter the same issue. Are you able to resolve it? I keep getting this, OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[768,768]. I already reduce the batch size to 3 but didn't work.

xuyige commented 3 years ago

Hi,

first, thank u for having sharing ur cod with us

I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted can someone tell me how to fix this

Also what are the expected output of this further pretraining Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?

Thank u

sorry for the late answer! i am not very familiar with tensorflow, but there are some suggestions:

check the version of tensorflow and make sure it is 1.1x
if it has OOM problems, please reduce your batch size or reduce your max sequence length. the official bert repo has provided an example:
we do not have some resources for fine-tuning with tensorflow, you can check from the official bert repo if you want

xuyige commented 3 years ago

Hey men, I encounter the same issue. Are you able to resolve it? I keep getting this, OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[768,768]. I already reduce the batch size to 3 but didn't work.

sorry for the late answer! if you have OOM problems, please reduce your batch size and max sequence length the official bert repo has provided some example with a 12G GPU

rajae-Bens commented 3 years ago

Hi,

thank u for answering

I reduced the train_batch_size to 8 and max_seq_length to 40

but I still get the resource exhausted error

I am running the code on colab gpu 12G RAM any ideas plz

thank u

xuyige commented 3 years ago

Hi,

thank u for answering

I reduced the train_batch_size to 8 and max_seq_length to 40

but I still get the resource exhausted error

I am running the code on colab gpu 12G RAM any ideas plz

thank u

as your description: does your model contain some other NN modules? does your colab gpu need to share with others? do you have enough cpu sources (e.g., it is cpu OOM but not gpu OOM)?