Closed smoothumut closed 2 years ago
Hi
It shouldn't use that much memory. I'm training on a single RTX 2080 with 8 GB of ram. Also, I don't see why changing the number of scenes would affect GPU memory usage. I'm not sure what's going on there. Let me know, if you find out.
BR, Rasmus
Hi, thanks for the quick response.
I have noticed that there is a memory leak on surface_embedding.py I guess it is logging during training and thus it increases the memory usage in every 4-5 seconds. I have checked it with "nvidia-smi" command during training
So I have commented on these 3 lines and now it even works with batch-size=32 and multiple threads :)
self.log(f'{log_prefix}/loss', loss)
self.log(f'{log_prefix}/mask_loss', mask_loss)
self.log(f'{log_prefix}/nce_loss', nce_loss)
I hope this disabled logging wont break things
thanks again for the great work,
That's weird. Disabling logging won't break anything, but if you want logging, try to update pytorch lightning, in case it's a bug in a specific version, or call .item() on the losses to be logged.
Hi, thanks for your great work, I hope I will make it work and be able to use it on my custom dataset. my problem is this; I have one 2080ti and I am trying to train the tless pbr dataset but I get an error "cuda out of memory" . I have used smaller batch size which is 8, I have decreased the number of workers to 0. but it keeps giving the error ( ok now it gives the error later than before but it still gives the error)
it only works if I decrease the scenes from 50 to 1 in train_pbr folder. otherwise no chance.
is this normal behavior with this one gpu , or I am missing something
thanks in advance