utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.85k stars 342 forks source link

Question: Fine tuning LM getting "RuntimeError: CUDA out of memory." #136

Open kleysonr opened 4 years ago

kleysonr commented 4 years ago

I'm able to train a classification model using a pretrained LM using the following parameters:


tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-uncased', do_lower_case=True)

databunch = BertDataBunch(
                DATA_PATH,
                DATA_PATH, 
                tokenizer=tokenizer,
                batch_size_per_gpu=14, 
                max_seq_length=100, 
                multi_gpu=False,
                multi_label=False,
                model_type='bert'
            )

# Setup Learner object
learner = BertLearner.from_pretrained_model(
                                databunch,
                                pretrained_path='bert-base-multilingual-uncased',
                                metrics=[{"name": "accuracy", "function": accuracy}], 
                                device=device,
                                multi_gpu=False,
                                is_fp16=False,
                                multi_label=False,
                                logging_steps=0,
                                output_dir=OUT_PATH,
                                logger=logger
                            )

When trying to fine tuning the LM using the same batch_size and max_seq_length, I'm getting OOM.

RuntimeError: CUDA out of memory. Tried to allocate 208.00 MiB (GPU 0; 5.94 GiB total capacity; 4.08 GiB already allocated; 213.44 MiB free; 29.05 MiB cached)

databunch_lm = BertLMDataBunch.from_raw_corpus(
                    data_dir=DATA_PATH,
                    text_list=texts,
                    tokenizer=tokenizer,
                    batch_size_per_gpu=14,
                    max_seq_length=100,
                    multi_gpu=False,
                    model_type='bert',
                    logger=logger
                )            

# Setup Learner object
learner = BertLMLearner.from_pretrained_model(
                            dataBunch=databunch_lm,
                            pretrained_path='bert-base-multilingual-uncased',
                            output_dir=MODEL_PATH,
                            metrics=[],
                            device=device,
                            logger=logger,
                            multi_gpu=False,
                            logging_steps=0,
                            is_fp16=False
                        )   

Doesn't matter how much I reduce the batch_size and max_seq_length I always get OOM. The only way to fine tuning the model is using batch_size = 1.

Shouldn't I be able to fine-tuning the LM using the same batch_size and max_seq_length as used to train the text classification model ?

kleysonr commented 4 years ago

Well,

Even using bs = 1, after some epocs still getting OOM.

12/02/2019 11:29:29 - INFO - root -   Running evaluation                                                                                         
12/02/2019 11:29:29 - INFO - root -   Num examples = 92
12/02/2019 11:29:29 - INFO - root -   Validation Batch size = 2
12/02/2019 11:29:38 - INFO - root -   eval_loss after epoch 4: 0.09997701385746831: ██████████████████| 100.00% [46/46 00:08<00:00]
12/02/2019 11:29:38 - INFO - root -   eval_perplexity after epoch 4: 1.1051454544067383: 
12/02/2019 11:29:38 - INFO - root -   lr after epoch 4: 3.6180339887498953e-05
12/02/2019 11:29:38 - INFO - root -   train_loss after epoch 4: 4.449118164323625
12/02/2019 11:29:38 - INFO - root -   

Traceback (most recent call last):                                                                                                              
  File "02_finetuning_LM.py", line 111, in <module>
    optimizer_type="lamb")
  File "/home/kleysonr/.virtualenvs/fastbert/lib/python3.6/site-packages/fast_bert/learner_lm.py", line 156, in fit
    loss.backward()
  File "/home/kleysonr/.virtualenvs/fastbert/lib/python3.6/site-packages/torch/tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/kleysonr/.virtualenvs/fastbert/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 208.00 MiB (GPU 0; 5.94 GiB total capacity; 4.08 GiB already allocated; 220.50 MiB free; 10.55 MiB cached)
DanyalAndriano commented 4 years ago

Try restarting your Jupyter kernel, and increasing the grad accumulation. I used grad accumulation 8 and train batch size 8, resulting in a train batch size of 64 with a seq length of 256. Without the grad accumulation, not even a batch size of 4 will run with 256 seq length.