vt-vl-lab / flownet2.pytorch

Off-the-shelf FlowNet module in PyTorch-0.3.0
Apache License 2.0
118 stars 37 forks source link

Trouble propagating a custom gradient through FlowNet2 model #6

Open MatthewInkawhich opened 6 years ago

MatthewInkawhich commented 6 years ago

I am trying to back-propagate a custom gradient tensor through the FlowNet2 model. I know that this is possible in PyTorch using the following methodology:

model = Net()
model = torch.load('./mnist_saved_model.pth')
model.eval()
output = model(img)
output.backward(custom_grad)

I am trying to replicate this with FlowNet2. Here is the relevant code snippet:

### Initialize Flownet  model
flownet_saved_model = "/root/checkpoints/FlowNet2_train-checkpoint.pth.tar"
flownet_model = FlowNet2()
pretrained_dict = torch.load(flownet_saved_model)['state_dict']
model_dict = flownet_model.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
model_dict.update(pretrained_dict)
flownet_model.load_state_dict(model_dict)
flownet_model.cuda()
flownet_model.eval()

...

### Push image pair through flownet in forward pass
curr_ims = torch.from_numpy(curr_ims)
curr_ims_v = Variable(curr_ims.cuda().float(), requires_grad=True)
curr_flownet_out = flownet_model(curr_ims_v)

### Inject a custom gradient tensor for this image pair
custom_grad = torch.FloatTensor(np.random.rand(1,2,128,192)).cuda()
curr_flownet_out.backward(custom_grad)
...

However, when I attempt to run this, I encounter an error at line: curr_flownet_out.backward(custom_grad)

Traceback (most recent call last):
  File "test.py", line 220, in <module>
    curr_flownet_out.backward(custom_grad)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)
  File "/root/pytorch_flownet2/FlowNet2_src/models/components/ops/channelnorm/functions/channelnorm.py", line 25, in backward
    grad_input1.data, ctx.norm_deg)
AttributeError: 'ChannelNormFunctionBackward' object has no attribute 'norm_deg'

Any ideas as to how I can successfully use PyTorch's autograd feature to propagate a custom gradient tensor through FlowNet2?

Thanks!

PK15946 commented 6 years ago

Hi, I encounter the same issue too, and I fixed it by using this, hope it can help you!

Yuliang-Zou commented 6 years ago

Hi @PK15946 , thanks for the information. Do you want to make a pull request?

liuqk3 commented 6 years ago

@MatthewInkawhich Hi, have you fixed this problem?

liuqk3 commented 6 years ago

@PK15946 Hi~, thanks for your information, but I can not open the link you provided, so did you remember how to fix this problem?

MatthewInkawhich commented 6 years ago

@liuqk3 Hi, no I did not end up using this repo. The code that I was trying to run worked on https://github.com/NVIDIA/flownet2-pytorch. This repo is based on NVIDIA's implementation anyway.

liuqk3 commented 6 years ago

@MatthewInkawhich I figured out the reason. There is something wrong in the file ./FlowNet2_src/models/components/ops/channelnorm/functions/channelnorm.py. Compared with the file here, we can find that the defination of backward function calls ctx.norm_deg, which is not defined in forward fucntion, so we can simply add the setence ctx.norm_deg = norm_deg in the defination of forward fucntion before return. Then the model can be trained :). But here comes another problem: the model can not converge with the lr = 1e-4 :(, now I am trying lr = 1e-5.

AliKafaei commented 5 years ago

of backward function calls ctx.norm_deg, which is not defined in forward fucntion, so we can simply add the setence ctx.norm_deg = norm_deg in the defination of forward fucntion before return. Then the model can be trained :). But here comes another problem: the model can not converge with the lr = 1e-4 :(, now I am trying lr = 1e-5.

Did decreasing learning rate help? I encountered the same problem in VGG and by freezing some weight it solved

luchen828 commented 2 years ago

hi,Can you post this link again? Or your solution? Thank you very much!