Closed Alt216 closed 2 years ago
Hi,
I have never observed this issue, but it is also true that I always had more than 16gb memory. Can you try to somehow find where the memory leak is or try to train it on a machine with 32gb ram?
For the other issue that you posted: I have uploaded the preprocesing scripts that we have used for semantic_kitti and other datasets
Thank you very much @zgojcic ! I will look into the possible memory leaks, and thanks for the preprocesing scripts.
For the memory issue,
I added .detach()
on line 145 and 151 in train.py as I saw some suggestion on the pytorch forum regarding storing the complete computation graph when adding losses. I am still unsure whether this is the issue but I will try to run the training with this modification.
After some more time searching on the web, I found this that could be a possible explanation. Maybe it has to do with the dataloader iterating across lists and dicts which adds up over time? The suggested solution is to replace them with numpy arrays.
Hi @Alt216 this could indeed be the case, at the moment I do not have time to investigate this (especially as it work ok on machines with more RAM), but if you can find the solution it would be great if you can make a PR.
Best Zan
Closing due to inactivity.
Hi, when I run
python train.py ./configs/train/train_weakly_supervised.yaml
to train the network from scratch using our dataset, my system memory usage will slowly increase until it max out the system memory and then the traning will crash. I have 16gb of system memory and the training can only go on for a little more than one epoch with ~16000 training samples. I tried to lower thenum_workers
to 4 and lower the batch size to 2 but they didn't seem to resolve the issue.