Open agemagician opened 5 years ago
I solved the problem by changing line number 29 in data_utils.py:
# Evenly divide the data across the bsz batches.
#self.data = data.view(bsz, -1).t().contiguous().to(device)
self.data = data.view(bsz, -1).t().contiguous().to('cpu')
Apparently, the train.py send cuda as a device and that was the issue.
I have 6 titan GPUs machine with 12 GB memory, I changed the code to add my own dataset. However, I always get cuda out of memory:
It doesn't matter whatever, I reduced the model size or the target length, or even add batch chunk. Here is my bash file:
It seems the script wants to load the whole data file into the GPU memory at once.