Closed Etienne66 closed 2 years ago
The commit doesn't actually change the behavior, it just makes the code easier to read. rbot0
and rbot1
will be modified in place, so even though in the previous version we put the save_for_backward
before populating rbot0
and rbot1
, the saved tensors weren't just zeros. You can verify this using gradcheck for example. That is, change all float
to double
in the correlation.py
and run the following code. Then change the location of save_for_backward
and try again. The gradcheck will pass both times. As for why I don't use comments, I have the philosophic that if your code requires comments then the code itself isn't written well enough (just my opinion though, I am well aware that the readability could still be improved with comments).
import torch
import sys; sys.path.insert(0, './correlation'); import correlation
class Network(torch.nn.Module):
def __init__(self):
super().__init__()
# end
def forward(self, tenOne, tenTwo):
return correlation.FunctionCorrelation(tenOne=tenOne, tenTwo=tenTwo)
# end
# end
netTest = Network().cuda()
tenOne = torch.randn(2, 4, 10, 10).double().cuda().requires_grad_()
tenTwo = torch.randn(2, 4, 10, 10).double().cuda().requires_grad_()
torch.autograd.gradcheck(func=netTest, inputs=tuple([tenOne, tenTwo]))
I'm not quite buying that theory that save_for_backward
lets you modify the values afterwards. The reason I say this is that my epochs were taking 29 hours but after this change they are now taking 30 hours. Something is taking longer to calculate and this was the only change I made.
I'm not quite buying that theory that save_for_backward lets you modify the values afterwards.
If you don't believe me then I encourage you to give it a try yourself. Print the average rbot0
in the backwards function and play around with the implementation. rbot0
(and rbot1
) will be nonzero regardless of where save_for_backward
is called but if you instead remove the two calls to kernel_Correlation_rearrange
then rbot0
(and rbot1
) will be zero.
It isn't so much that I don't believe you. I know you have more experience with this than I do. I just don't understand why that change would be more computational if the value was allowed to be modified after the save_for_backward
plus the default for new_zeros
is requires_grad = False
which is what is declared right before the original save_for_backward
so a grad check on that tensor wouldn't show anything. Besides I always thought in place variable changes were the ones being passed in not on ones being passed out. It seems like save_for_backward
would retain the value at the time it was called but I can't find a lot of documentation on it.
the one thing I do see is that mark_dirty
must be used to mark any input that is modified inplace by the forward function. It makes more sense if that is supposed to be used on rbot0
and rbot1
. I didn't really understand the use of mark_dirty
and I could not find a single example.
I am afraid that I don't know what happens in PyTorch internally. Both versions of save_for_backward
produce the same result though, so if one is faster then the other for you there is nothing that should hold you back form using the faster one. :slightly_smiling_face:
Well as long as you are sure. I still wonder why the computations are taking longer though. It certainly seemed like it made a big difference in my loss total as well. I'm a PL/SQL developer in my professional life and I'm still very new at Python and Pytorch. I appreciate all of the information @sniklaus
On commit 5f4d7def149c71ee1610c527e5502264e847c940 in
correlation.py
you moved thesave_for_backward
to the end just before thereturn
in theforward
definition for the class_FunctionCorrelation
. I am guessing that is becauserbot0
andrbot1
have only zeros at the place in code thesave_for_backward
was before. I haven't finished training my model which uses your model on as well but it is definitely taking longer and using the CPU a lot more. I'm guessing that is because thebackward
finally has some data forrbot0
andrbot1
.Not sure why you don't like putting comments in your code nor in your commits as to why something was changed but a lot of PHD students seem to be that way. I'm a programmer and I get chewed out by my boss if I do that. Oh well... I really do appreciate you improving your code so I can't complain too much :smile: