Memory leak during network growing?

nashory / pggan-pytorch

:fire::fire: PyTorch implementation of "Progressive growing of GANs (PGGAN)" :fire::fire:

MIT License

819 stars 134 forks source link

Memory leak during network growing? #16

Open iwtw opened 6 years ago

iwtw commented 6 years ago

thanks for providing your code , it's much more readable than the original one.
i have observed severe memory leak in training .
During training , former allocated batch tensors of smaller network is never removed from gpu memory.

I notice that you use very small batch to solve this , but it makes the training painfully slow : ( .
have you found any better solution?

iwtw commented 6 years ago

I'm sorry , the memory issue is not caused by a memory leak but because the fixed batch size of z_test . Actually the z_test batch size is exactly the train batch size of initial resolution
so increasing the batch size of the initial resolution probably causes memory problem in incoming training for higher resolution. And it's easy to tackle.

nashory commented 6 years ago

oh, I see maybe we need to change batchsize of z_test according to the resolution to avoid excessive use of memory for higher resolution. thanks for reporting that :) I found the data loader is the bottleneck which harms the entire training speed. do you have any idea about this?

iwtw commented 6 years ago

doing the preprocessing offline might give a little help ? you are doing preprocessing on CPU online , which is time consuming

coralreefman commented 6 years ago

Hey, I think I'm getting an 'out of memory' message for the same reason after 4 resolutions, even if I change the 4th resolution batchsize to 1. However I'm not sure how to change the batchsize of z_test without hardcoding it. Did you already come up with any solutions? Thanks!

mlandcv commented 6 years ago

same error. any solutions yet?

nashory commented 6 years ago

Hi, I found several issues occurs during yhe training, so I'm planning to perform refactoring the entire code soon. Thanks :-)