Increasing size of training dataset (same batch size) results in OOM error.

harryseely commented 1 year ago

Hello,

I am currently using a batch size of 8 with around 2000 samples in my training set. I have noticed that when keeping all other hyperparameter static, if I increase the size of my training dataset to 4000 samples, I run into out of memory errors near the very end of the epoch (95% finished).

Do you have any idea why increase the number of samples may result in OOM even when the batch size remains the same?

I am using a DGCNN model with the same training loop and hyperparameters and do not run into this issue, so I am wondering if it is OCNN specific.

Let me know what you think.

Thanks,

Harry

wang-ps commented 1 year ago

There must be something wrong within your code.

For your reference (https://ocnn-pytorch.readthedocs.io/en/latest/notes/classification.html), on the modelnet40 dataset, with batch size 32 and point number 8000, the memory consumption of LeNet is less than 700M, and the memory consumption of ResNet is less than 3GB. Please compare your code with the examples I provided to debug your code.

harryseely commented 1 year ago

Ok I will continue to investigate and debug my code, thank you for the input

octree-nn / ocnn-pytorch

Increasing size of training dataset (same batch size) results in OOM error. #18