Closed danieltudosiu closed 4 years ago
I have solved it. It is a bad interplay between how pytorch.optim.Optimizer.zero_grad() releases the gradients in a not very optimal way which does not play well at all with how many gradients the Adaptive Loss calculates (pytorch official way vs apex way). I solved it by calling gc.collect() at the end of a pass.
I have generalised your loss to Grayscale Volume (MRI images) and I am obtaining memory leaks if I use any optimizer BUT the fused Adam from Apex.
The way I generalised it is by adding in:
adaptive.py
def volume_idct(dct_x): """Inverts image_dct(), by performing a type-III DCT.""" return torch_dct.idct_3d(torch.as_tensor(dct_x), norm="ortho")