prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit
MIT License
173 stars 32 forks source link

GPU Consumption keeps on increasing #25

Closed nikhilbyte closed 2 years ago

nikhilbyte commented 2 years ago

Hi, I started training the model with the following parameters: python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --langs hi_IN --batch_size_indicates_lines --pretrained_model "facebook/mbart-large-50" --model_path "facebook/mbart-large-50" --tokenizer_name_or_path "facebook/mbart-large-50" --mono_src "sans_seq2seq/cleaned_Sanskrit_text_for_LM.txt" --shard_files --batch_size 2

It starts training, however, after a few hours, it crashes due to OOM. Monitoring the GPU, I found that the GPU consumption keeps on increasing.

GPU Memory is 48GB.

Can you please tell me what could cause this? Thanks

prajdabre commented 2 years ago

Most likely a very long sequence. Try setting the --hard_truncate_length flag to a smaller value. Currently it's 1024 and this may be too much. Try 256. Try to find out the example on which you get an OOM or paste the error logs. I've never actually tested the pretraining functionality on mbart 50 so it will be helpful to know what's causing the issue.

For reference, I've done fine tuning of mbart 50 on a 32 GB GPU and whenever I get ooms it's usually because of a stray example.