Open iwtw opened 6 years ago
I'm sorry , the memory issue is not caused by a memory leak but because the fixed batch size of z_test . Actually the z_test batch size is exactly the train batch size of initial resolution
so increasing the batch size of the initial resolution probably causes memory problem in incoming training for higher resolution. And it's easy to tackle.
oh, I see maybe we need to change batchsize of z_test according to the resolution to avoid excessive use of memory for higher resolution. thanks for reporting that :) I found the data loader is the bottleneck which harms the entire training speed. do you have any idea about this?
doing the preprocessing offline might give a little help ? you are doing preprocessing on CPU online , which is time consuming
Hey, I think I'm getting an 'out of memory' message for the same reason after 4 resolutions, even if I change the 4th resolution batchsize to 1. However I'm not sure how to change the batchsize of z_test without hardcoding it. Did you already come up with any solutions? Thanks!
same error. any solutions yet?
Hi, I found several issues occurs during yhe training, so I'm planning to perform refactoring the entire code soon. Thanks :-)
thanks for providing your code , it's much more readable than the original one.
i have observed severe memory leak in training .
During training , former allocated batch tensors of smaller network is never removed from gpu memory.
I notice that you use very small batch to solve this , but it makes the training painfully slow : ( .
have you found any better solution?