philferriere / tfoptflow

Optical Flow Prediction with TensorFlow. Implements "PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume," by Deqing Sun et al. (CVPR 2018)
MIT License
525 stars 133 forks source link

Possibly wrong use of flow (Important!) #6

Open rajat95 opened 6 years ago

rajat95 commented 6 years ago

the use of dense_image_warp function is wrong. In line 194 of file core_warp.py, you use query_points_on_grid = batched_grid - flow but what this line does is a forward warp. But in model the warping image 2 to image 1 implies a backward warp. So, this should be query_points_on_grid = batched_grid + flow. I will try to explain my point. Say flow at point (i,j) is (fi,fj). This implies im2[i+fi,j+fj] = im1[i,j] i.e pixels i,j in im1 moved to i+fi and j+fj in im2. Now, when you warp im2 by flow, you aim to get moved pixels back to their corresponding place in im1, so (i',j') pixels in im1 must come from i'+fi and j+fj location in the im2, which is opposite of what this function is doing. Consider a black and white image(256*256) with a white background and single black dot at the centre. Suppose the flow for that dot is (5,5) . This implies im1[128,128] = 0; and im2[133,133] = 0. Now say you are only given flow and im2 and you need to retrieve im1. So, clearly output[128,128] = im2[128+5, 128+5]. This is confusing and i hope this example makes it clear.

This is a core issue that should have impacted results, why this method trains so well despite this is beyond me.

alisaaalehi commented 6 years ago

Hi @rajat95, if flow vector in location (i, j) is (fi, fj), it means that im2[i, j] = im1[i-fi, j-fj]

rajat95 commented 6 years ago

Hi @alisaaalehi Didnt equation I wrote imply the same thing? Also, I calculated flow for [im1,im2] using the same model here and warped im2 with flow. But what I noticed was that the resultant image moved in direction opp to flow between im1 and im2, implying its doing a forward warp, One can easily hack around it by warping im2 with negative flow, but since in the model itself we use the same warp function to warp im2 to im1, this implies, the step is possibly not what authors of paper intended. The intentions to warp are to align the two images (by aligning img2 to 1), which I dont think this function does (it aligns img1 to img2 ).

philferriere commented 6 years ago

Hi @rajat95 I believe you're correct. What we want is:

warped_im2[b, y, x, c] = im2[b, y + flow[b, y, x, 0], x + flow[b, y, x, 1], c]

not:

warped_im2[b, y, x, c] = im2[b, y - flow[b, y, x, 0], x - flow[b, y, x, 1], c]

as seems to be the case when using the stock tf.contrib.image.dense_image_warp function.

The fix is straightforward, replacing query_points_on_grid = batched_grid - flow by query_points_on_grid = batched_grid + flow.

I'll be retraining the models with the fix over the next two weeks, and will keep this issue open until then. Meanwhile, as you've observed as well, you can keep on using the pre-trained models and generate accurate flow predictions (!)

Thanks for getting to the bottom of this, @rajat95!

alisaaalehi commented 6 years ago

I think this equation is the correct one:

warped_im2[b, y, x, c] = im2[b, y - flow[b, y, x, 0], x - flow[b, y, x, 1], c]

and the implementation in tf.contrib.image.dense_image_warp() is also correct. In most of the datasets that I've seen, the flow vector is stored at the position that the pixel is moving to.

Look at this simple example in which just a black point is moving from location [2,6] to [5,10], so the flow vector [3,4] will be stored at location (5,10) of the flow map. All other flow vectors would be [0,0]

flowexplained

Base on my first comment and @rajat95's second equation, let's see what pixels from first image should be copide to fill the locatins [2,6] and [5,10] in the second image if we want to do forward warping:

# im2[i, j] = im1[i-fi, j-fj]
im2[2,6] = im1[2-0, 6-0]  #  this will copy a white pixel
im2[5,10] = im1[5-3, 10-4]  #  this will copy a black pixel from first image to
                                   #  the warped image. which is correct
philferriere commented 6 years ago

Per page 4 of NVidia's paper, section "Warping layer":

At the l-th level, we warp features of the second image Im2 toward the first image Im1 using the x2 upsampled flow from the l+1th level:

C1wl(x) = C2l(x + Up2(wl+1)(x))

where x is the pixel index and the upsampled flow Up2(wl+1) is set to be zero at the top level.

In NVidia's PyTorch reference code, this is implemented as follows:

# warp an image/tensor (im2) back to im1, according to the optical flow
# x: [B, C, H, W] (im2)
# flo: [B, 2, H, W] flow
def warp(self, x, flo):

    B, C, H, W = x.size()
    # mesh grid
    xx = torch.arange(0, W).view(1,-1).repeat(H,1)
    yy = torch.arange(0, H).view(-1,1).repeat(1,W)
    xx = xx.view(1,1,H,W).repeat(B,1,1,1)
    yy = yy.view(1,1,H,W).repeat(B,1,1,1)
    grid = torch.cat((xx,yy),1).float()

    if x.is_cuda:
        grid = grid.cuda()
    vgrid = Variable(grid) + flo

    # scale grid to [-1,1]
    vgrid[:,0,:,:] = 2.0*vgrid[:,0,:,:]/max(W-1,1)-1.0
    vgrid[:,1,:,:] = 2.0*vgrid[:,1,:,:]/max(H-1,1)-1.0

    vgrid = vgrid.permute(0,2,3,1)
    output = nn.functional.grid_sample(x, vgrid)
    mask = torch.autograd.Variable(torch.ones(x.size())).cuda()
    mask = nn.functional.grid_sample(mask, vgrid)

    mask[mask<0.9999] = 0
    mask[mask>0] = 1

    return output*mask
[...]
warp5 = self.warp(c25, up_flow6*0.625)
warp4 = self.warp(c24, up_flow5*1.25)
warp3 = self.warp(c23, up_flow4*2.5)
warp2 = self.warp(c22, up_flow3*5.0)

@alisaaalehi. in the above, should:

vgrid = Variable(grid) + flo

have be implemented as:

vgrid = Variable(grid) - flo

?

rajat95 commented 6 years ago

"In most of the datasets that I've seen, the flow vector is stored at the position that the pixel is moving to" I am not sure about this. But if this statement is true, I agree that the tensorflow's formula for warp is the one to be used. But if you use the visualizations generated here in the evaluation notebooks, flow vectors are present at the original location of the pixel.

apxlwl commented 5 years ago

@philferriere Thank you for your discussion. Is there any updated results using the correct fuction(i.e. query_points_on_grid = batched_grid + flow)? I noticed that the code in master branch is still with "-flow".

Blcony commented 5 years ago

@philferriere Thank you for your discussion. Is there any updated results using the correct fuction(i.e. query_points_on_grid = batched_grid + flow)? I noticed that the code in master branch is still with "-flow".

Blcony commented 5 years ago

the use of dense_image_warp function is wrong. In line 194 of file core_warp.py, you use query_points_on_grid = batched_grid - flow but what this line does is a forward warp. But in model the warping image 2 to image 1 implies a backward warp. So, this should be query_points_on_grid = batched_grid + flow. I will try to explain my point. Say flow at point (i,j) is (fi,fj). This implies im2[i+fi,j+fj] = im1[i,j] i.e pixels i,j in im1 moved to i+fi and j+fj in im2. Now, when you warp im2 by flow, you aim to get moved pixels back to their corresponding place in im1, so (i',j') pixels in im1 must come from i'+fi and j+fj location in the im2, which is opposite of what this function is doing. Consider a black and white image(256*256) with a white background and single black dot at the centre. Suppose the flow for that dot is (5,5) . This implies im1[128,128] = 0; and im2[133,133] = 0. Now say you are only given flow and im2 and you need to retrieve im1. So, clearly output[128,128] = im2[128+5, 128+5]. This is confusing and i hope this example makes it clear.

This is a core issue that should have impacted results, why this method trains so well despite this is beyond me.

Hi, rajat95~

Have you replicated the results (including finetuned on MPI-sintel and Kitti) compared with official version using tensorflow ?

I just started to learn the optical flow estimation, so your reply will be very helpful for me!

Thank you very much!

Blcony

zhtmike commented 5 years ago

"In most of the datasets that I've seen, the flow vector is stored at the position that the pixel is moving to" I am not sure about this. But if this statement is true, I agree that the tensorflow's formula for warp is the one to be used. But if you use the visualizations generated here in the evaluation notebooks, flow vectors are present at the original location of the pixel.

I think @alisaaalehi 's explanation is correct. By the way the warping is taking on the second frame as the paper mentioned. So from the first frame's view, it is still a forward warping.

Superlee506 commented 4 years ago

@philferriere Any progress? If @alisaaalehi is right, the original implement would have some mistakes.

Superlee506 commented 4 years ago

@alisaaalehi Hi, your comment is not consistent with L153 in your project. Your experiments show that the the flow vectors are strored at source location. However you answer here indiactes the flow vectors are stored at destination location. Which one is right?

JunggiLee12 commented 4 years ago

@philferriere @alisaaalehi Thank you for your code and discussion. Is there any updated result in the code~?

yaanggny commented 2 years ago

I think the code is wrong. first, the flow is defined as a vector from p1 in image1 to p2 in image2, so p2 = p1+v. second, the warp operation is to get the remap image from image2, that is for estimate p1_pixels from image2 using the flow. So we get p1_pxiel=I2(p2')=I2(grid + v).

Besides, we can warp an image pair using the flow, the minus way generates bad result of alignment, but the plus one really align the images.