sniklaus / pytorch-pwc

a reimplementation of PWC-Net in PyTorch that matches the official Caffe version
GNU General Public License v3.0
608 stars 122 forks source link

A backward issue #27

Closed JiaBob closed 4 years ago

JiaBob commented 4 years ago

Hi sniklaus. I suffered a problem when I tried to use your model as an optical flow prediction component of my model. I simplify the issue as below:

t1 = torch.randn((16,3,256,256))
t2 = torch.randn((16,3,256,256))
input = Network()(t1, t2)
input.backward()

It will raise "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn" on the backward line. Does it mean there is somewhere non-differentiable inside the network? How can I deal with this if I just want to use the output to train other models?

sniklaus commented 4 years ago

I am afraid that I am a little bit confused, but I see no loss function in your code. If there is nothing that you are optimizing then backpropagation does not have any gradients to propagate.

JiaBob commented 4 years ago

Sorry for that my too simple example makes you confused. If it is the problem of lack of loss function, I think the error will be something else. In my code, its like

            pred_flow = estimate(fake_real_frame1, fake_real_frame2, mode="tensor")
            op_loss = mse(pred_flow, flow)
            op_loss.backward()

The estimate function is similar as the function in your code. I found it may be my environment problem, as it raises same error even for sample codes from pytorch tutorial.

sniklaus commented 4 years ago

Are you using an only version of PyTorch? It used to be that you had to wrap tensors in variables to be able to make use of backpropagation. That would explain the error.

JiaBob commented 4 years ago

I use the latest version of PyTorch. I think there is no need to wrap tensors in variables.

sniklaus commented 4 years ago

You could check print(torch.__version__) to see whether your Python environment is also loading the most recent PyTorch version that you have installed.

JiaBob commented 4 years ago

I checked that it is latest. I am running the model on clusters. It is unstable sometimes. If you are sure that there is no backward problem on your computer, let me know and I will check it on my side.

sniklaus commented 4 years ago
torch.set_grad_enabled(True)
moduleNetwork = Network().cuda().train()
t1 = torch.randn(16,3,256,256).cuda()
t2 = torch.randn(16,3,256,256).cuda()
truth = torch.randn(16,2,64,64).cuda()
result = moduleNetwork(t1, t2)
loss = torch.nn.functional.l1_loss(result, truth)
loss.backward()

Not pretty but it works, seems like an issue on your end.

JiaBob commented 4 years ago

Alright, thanks

JiaBob commented 4 years ago

After half a day, I finally find out the reason.......... There is "torch.set_grad_enabled(False)" at the beginning of your run.py file and I simply imported your network from your run.py file, which makes all my parameters frozen..... A good leason.