Closed supersyq closed 1 year ago
hi, I actually use the same GPU card and didn't experience any issues. I am not sure if that's really CUDA OO memory problem, I would remove try ... except and see what's the error there. If it's really a problem of CUDA memory, then you can decrease the batch-size and proportionally increase the iter_size such that batch_size times iter_size stays the same. This guarantees for each optimisation step, the gradients are calculated over the same amount of samples.
Close due to inactivity.
Hi, thanks for sharing this wonderful project. I am currently training the network by running
After 2-epoch training, the following errors occur:
It seems that all iterations are skipped due to being out of the CUDA memory. And I also set the batch size to 2 and the same problem occurs. Do you have any suggestions to solve this problem? (The training is on: Python 3.8.8, Pytorch 1.12.0+cu116, a NVIDIA TITAN RTX GPU) Besides, I set the hyper-parameter ''iter_size'' from 1 to 2, and the problem seemed to be solved. But I am very worried about whether this would have a negative impact on the model training and could not get the experimental results in the paper. Would you share more details about the parameter settings for model training?