Closed Manas-Embold closed 3 years ago
If you use the same setting as the repo (i.e. source_length=256 and target_length=128), you should be able to run the code2text model with batch size = 16 on one P100.
However, according to your description, you should check whether your CPU memory can store all 5 lakhs datapoints.
I am using same settings, except that i have more data points. If i take a subset of 2.5 lakh data points, training starts, however for 5 lakh+ files, it just quits with some memory allocation issue.
I think it is taking memory at "converting examples to features" step.
yes. One suggestion is to remove source_mask and target_mask. You can use source_ids.ne(1) and target_ids.ne(1) to obtain source_mask and target_mask, which can save a half of memory.
Can you suggest the line number in code and will it have any impact on training ? Just want to be super sure on what i need to change and if it would have any impact.
you just need to remove all source_mask
and target_mask
variables. And replace them with source_ids.ne(1)
and target_ids.ne(1)
, respectively. It would not have any impact. Since source_mask=source_ids.ne(1)
and target_mask=target_ids.ne(1)
, removing source_mask and target_mask variables will save memory.
Hi There I am working on code2text problem, i have created my own dataset for javascript code, comment pairs in ".jsonl" format with 5 lakhs+ datapoints. However, i am unable to start training on P100 google colab GPU (with 16GB VRam) even with batch size 1 due to memory issues.
In case, i reduce the datapoints in range of 2.5 lakhs from original 5 lakhs, i am able to start training.
Any thoughts which step in code consumes so much memory and training couldn't start even with batch size 1 as i want to train on entire 5 lakh files on google colab.