Hi, I am trying to finetune 160gb model with custom dataset with following command. But with smallest settings also it goes out of memory after some update runs. I also, remove --fp16 .. but I don't see any memory improvement.
I tries with max-source-positions as 512,768,1024, update-freq as 1, 2, 4,8, batch-size as 1,2,4,8.
--fp16 enabled/disabled. When I remove this --tensorboard-logdir $TENSORBOARD_LOGDIR ... it works but can't go beyond batch-size of 2. So overall slow.
For multigpu run are there any other settings ??
I am wondering how it ran on 8 * NVIDIA V100 (16GB) GPUs .... with the settings given in ReadMe file.
Let me know.
Hi, I am trying to finetune 160gb model with custom dataset with following command. But with smallest settings also it goes out of memory after some update runs. I also, remove --fp16 .. but I don't see any memory improvement.
I tries with
max-source-positions
as 512,768,1024,update-freq
as 1, 2, 4,8,batch-size
as 1,2,4,8. --fp16 enabled/disabled. When I remove this--tensorboard-logdir $TENSORBOARD_LOGDIR
... it works but can't go beyond batch-size of 2. So overall slow.For multigpu run are there any other settings ?? I am wondering how it ran on 8 * NVIDIA V100 (16GB) GPUs .... with the settings given in ReadMe file. Let me know.
OS - Ubuntu 16.04 Cuda - 10 Machine - 4x T4 gpus (16gb each), aws g4dn.12xlarge instance Libraries pytorch-transformers==1.2.0 torch==1.4.0 fairseq==0.9.0