n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
284 stars 56 forks source link

OOM during finetuning #77

Open Shiro-LK opened 4 years ago

Shiro-LK commented 4 years ago

Hi,

Thank you for sharing your repo.

I am trying to finetune a LM with multifit on custom dataset and then finetune the classifier for prediction. Unfortunately I got an OOM after few steps with multifit during the training of the CLS. I tried to first train the LM then close the session to clean the gpu memory and then train the classifier (loading the encoder weights if I am not wrong in my code) but it does not help. I can not use the same batch size. Is it normal or am I doing something wrong ? PS : bs = 256 `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last)

in () 3 learn_cls_fwd.load_encoder("encoder_lm_fr_fwd") 4 learn_cls_fwd.freeze() ----> 5 learn_cls_fwd.fit_one_cycle(3) 6 learn_cls_fwd.save("multifit_cls_pretrained_fr") 9 frames /usr/local/lib/python3.6/dist-packages/fastai/text/learner.py in (.0) 253 def concat(self, arrs:Sequence[Sequence[Tensor]])->List[Tensor]: 254 "Concatenate the `arrs` along the batch dimension." --> 255 return [torch.cat([l[si] for l in arrs], dim=1) for si in range_of(arrs[0])] 256 257 def reset(self): RuntimeError: CUDA out of memory. Tried to allocate 1.02 GiB (GPU 0; 15.90 GiB total capacity; 12.72 GiB already allocated; 599.88 MiB free; 14.61 GiB reserved in total by PyTorch)` My piece of code : ``` # pretrained LM if pretrained_lm: data_lm_fwd = (TextList.from_df(lm_tr.iloc[:10000], path, cols='comment_text', **fa_config) .split_by_rand_pct(0.05, seed=42) .label_for_lm() .databunch(bs=bs, num_workers=4)) data_lm_fwd.save("fr_data_lm_forward") if pretrained_lm: learn_fwd = exp.finetune_lm.get_learner(data_lm_fwd) learn_fwd.model.cuda() learn_fwd.lr_find() learn_fwd.recorder.plot() # learn is a preconfigured fastai learner with a pretrained model loaded if pretrained_lm: learn_fwd.fit_one_cycle(2) learn_fwd.unfreeze() for i in range(5): learn_fwd.fit_one_cycle(2) learn_fwd.save_encoder("encoder_lm_fr_fwd") # cls if pretrained_cls: data_cls = (TextList.from_df(tr1, path, cols="comment_text", **fa_config) .split_from_df(col="val") .label_from_df(cols="toxic") .databunch(bs=64, num_workers=2)) if pretrained_cls: learn_cls_fwd = exp.classifier.get_learner(data_cls)#, metrics=[AUROC]) learn_cls_fwd.load_encoder("encoder_lm_fr_fwd") learn_cls_fwd.freeze() learn_cls_fwd.fit_one_cycle(3) learn_cls_fwd.save("multifit_cls_pretrained_fr") ```