longcw / RoIAlign.pytorch

RoIAlign & crop_and_resize for PyTorch
554 stars 103 forks source link

Hi gradcheck failed #1

Open mks0601 opened 6 years ago

mks0601 commented 6 years ago

Hi thanks for sharing your implementation.

I want to use RoIAlign layer in my pytorch code, and I found your implementation. To verify your implementation, I ran the test.py and the gradcheck failed. Did you check the code?

longcw commented 6 years ago

I also noticed this problem. There is a gap between numerical grad and analytical grad. But outputs and grads of pytorch version and tensorflow version are almost the same.

 numerical:(
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.2012  0.0000  0.0000  0.0000
 0.6258  0.1490  0.0000  0.0000
 0.0000  0.6855  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0373  0.0000  0.1788  0.0000
 0.1341  0.0298  0.5662  0.1192
 0.0000  0.1490  0.0000  0.5960
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0596  0.0000
 0.0000  0.0000  0.2086  0.0596
 0.0000  0.0000  0.0000  0.2384
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
[torch.FloatTensor of size 25x4]
,)
analytical:(
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.2111  0.0000  0.0000  0.0000
 0.6141  0.1408  0.0000  0.0000
 0.0000  0.6844  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0447  0.0000  0.1893  0.0000
 0.1300  0.0298  0.5507  0.1263
 0.0000  0.1449  0.0000  0.6138
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0665  0.0000
 0.0000  0.0000  0.1934  0.0444
 0.0000  0.0000  0.0000  0.2156
 0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000
[torch.FloatTensor of size 25x4]
,)
mks0601 commented 6 years ago

Thank you for check Did you achieved the similar result with mask-rcnn with your roi align module?

-- Gyeongsik Moon Ph.D. Candidate Department of ECE, SNU, Seoul, Korea http://cv.snu.ac.kr http://cv.snu.ac.kr/

      1. 오후 1:02, longcw notifications@github.com 작성:

I also noticed this problem. There is a gap between numerical grad and analytical grad. But outputs and grads of pytorch version and tensorflow version are almost the same.

numerical:( 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2012 0.0000 0.0000 0.0000 0.6258 0.1490 0.0000 0.0000 0.0000 0.6855 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0373 0.0000 0.1788 0.0000 0.1341 0.0298 0.5662 0.1192 0.0000 0.1490 0.0000 0.5960 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0596 0.0000 0.0000 0.0000 0.2086 0.0596 0.0000 0.0000 0.0000 0.2384 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 [torch.FloatTensor of size 25x4] ,) analytical:( 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2111 0.0000 0.0000 0.0000 0.6141 0.1408 0.0000 0.0000 0.0000 0.6844 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0447 0.0000 0.1893 0.0000 0.1300 0.0298 0.5507 0.1263 0.0000 0.1449 0.0000 0.6138 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0665 0.0000 0.0000 0.0000 0.1934 0.0444 0.0000 0.0000 0.0000 0.2156 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 [torch.FloatTensor of size 25x4] ,) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/longcw/RoIAlign.pytorch/issues/1#issuecomment-350523583, or mute the thread https://github.com/notifications/unsubscribe-auth/AM-Lu8_QGoegiZsOVSnjnQNJU02rcEUeks5s-1fLgaJpZM4Q8J9C.

longcw commented 6 years ago

I am not working on Mask RCNN. BTW, I found that this layer can pass the gradcheck if I set eps=1e-3. eps is the perturbation for finite differences.

gradcheck(roi_align, (image_torch, boxes, box_index), eps=1e-3)

output (max_val, min_error, max_error, mean_error):

('forward:', 0.87139809, 0.0, 7.0184469e-06, 5.5792748e-07)
('backward:', 1.0228419, 0.0, 1.7911196e-05, 9.7078487e-09)
test ok
mks0601 commented 6 years ago

How many time did you run the gradcheck?

I ran it 10 times, but only 2 passsed the gradcheck.

-- Gyeongsik Moon Ph.D. Candidate Department of ECE, SNU, Seoul, Korea http://cv.snu.ac.kr/ http://cv.snu.ac.kr/

From: longcw [mailto:notifications@github.com] Sent: Sunday, December 10, 2017 3:18 PM To: longcw/RoIAlign.pytorch RoIAlign.pytorch@noreply.github.com Cc: Gyeongsik Moon mks0601@gmail.com; Author author@noreply.github.com Subject: Re: [longcw/RoIAlign.pytorch] Hi gradcheck failed (#1)

I am not working on Mask RCNN. BTW, I found that this layer can pass the gradcheck if I set eps=1e-3. eps is the perturbation for finite differences.

gradcheck(roi_align, (image_torch, boxes, box_index), eps=1e-3)

output (max_val, min_error, max_error, mean_error):

('forward:', 0.87139809, 0.0, 7.0184469e-06, 5.5792748e-07) ('backward:', 1.0228419, 0.0, 1.7911196e-05, 9.7078487e-09) test ok

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/longcw/RoIAlign.pytorch/issues/1#issuecomment-350527886 , or mute the thread https://github.com/notifications/unsubscribe-auth/AM-Lu8lNUQkESuGHouF_jRgJrZluMwJiks5s-3ejgaJpZM4Q8J9C . https://github.com/notifications/beacon/AM-LuytAF_ONMjU1R8hR8Lw-xK1mVysYks5s-3ejgaJpZM4Q8J9C.gif

longcw commented 6 years ago

@mks0601 Try to modify the random input image:

# image_data = np.random.randn(batch_size, depth, im_height, im_width).astype(np.float32)
# =>
image_data = np.random.rand(batch_size, depth, im_height, im_width).astype(np.float32)
mks0601 commented 6 years ago

Sorry to say, but changing the input seems not good...

It shows the implemented roi align layer is working on the specific input form (or at least, it does not work on the specific input form such as randn).

Can you tell me why there exists that kind of error?

-- Gyeongsik Moon Ph.D. Candidate Department of ECE, SNU, Seoul, Korea http://cv.snu.ac.kr/ http://cv.snu.ac.kr/

From: longcw [mailto:notifications@github.com] Sent: Sunday, December 10, 2017 3:59 PM To: longcw/RoIAlign.pytorch RoIAlign.pytorch@noreply.github.com Cc: Gyeongsik Moon mks0601@gmail.com; Mention mention@noreply.github.com Subject: Re: [longcw/RoIAlign.pytorch] Hi gradcheck failed (#1)

@mks0601 https://github.com/mks0601 Try to modify the random input image:

image_data = np.random.randn(batch_size, depth, im_height, im_width).astype(np.float32)

=>

image_data = np.random.rand(batch_size, depth, im_height, im_width).astype(np.float32)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/longcw/RoIAlign.pytorch/issues/1#issuecomment-350529347 , or mute the thread https://github.com/notifications/unsubscribe-auth/AM-Lux14lIa-OJC00cQamIRTF1-ooj1eks5s-4EvgaJpZM4Q8J9C . https://github.com/notifications/beacon/AM-Lu7T5Ij_Ja2EidOFyR9kOncNzcC-fks5s-4EvgaJpZM4Q8J9C.gif

longcw commented 6 years ago

I don't think this is the problem of the implementation. It's the problem we using gradcheck. Changing randn to rand actually decreases the max value of inputs. It can always pass the check if eps > max(inputs)/500, whatever the input is.

I don't know the real reason. You can check the gradcheck function and the source code if you want to figure out the reason for this problem.

mks0601 commented 6 years ago

Also, can you let me understand the result of your roi_align layer?

If I fed the input tensor as

0 1 2 3 4 5 6

0 1 2 3 4 5 6

0 1 2 3 4 5 6

0 1 2 3 4 5 6

0 1 2 3 4 5 6

0 1 2 3 4 5 6

0 1 2 3 4 5 6

(1x1x7x7)

to the roi_align layer (crop_height=3, crop_width=3)

with argument (xs = [[0,2]], ys = [[0,2]], nbox=1, nbatch=1)

then I think the output should be

0 1 2

0 1 2

0 1 2

.

But, the output of your implementation is different.

Did I understand the roi align in a wrong way?

-- Gyeongsik Moon Ph.D. Candidate Department of ECE, SNU, Seoul, Korea http://cv.snu.ac.kr/ http://cv.snu.ac.kr/

From: longcw [mailto:notifications@github.com] Sent: Sunday, December 10, 2017 4:55 PM To: longcw/RoIAlign.pytorch RoIAlign.pytorch@noreply.github.com Cc: Gyeongsik Moon mks0601@gmail.com; Mention mention@noreply.github.com Subject: Re: [longcw/RoIAlign.pytorch] Hi gradcheck failed (#1)

I don't think this is the problem of the implementation. It's the problem we using gradcheck. Changing randn to rand actually decreases the max value of inputs. It can always pass the check if eps > max(inputs)/500, whatever the input is.

I don't know the real reason. You can check the gradcheck function and the source code if you want to figure out the reason for this problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/longcw/RoIAlign.pytorch/issues/1#issuecomment-350531415 , or mute the thread https://github.com/notifications/unsubscribe-auth/AM-Lu9NjLL3PsXS4Kw3aF3VFACog-Zi4ks5s-45agaJpZM4Q8J9C . https://github.com/notifications/beacon/AM-Lu-9_cbPKx0Hv0WYJ-EkTnZS2pAp6ks5s-45agaJpZM4Q8J9C.gif

longcw commented 6 years ago

What you want is crop_and_resize.

import numpy as np
import torch
from torch.autograd import Variable

from roi_align.roi_align import RoIAlign

def to_varabile(arr, requires_grad=False, is_cuda=True):
    tensor = torch.from_numpy(arr)
    if is_cuda:
        tensor = tensor.cuda()
    var = Variable(tensor, requires_grad=requires_grad)
    return var

# inputs
is_cuda = False
image_data = np.tile(np.arange(7, dtype=np.float32), 7).reshape(7, 7)
image_data = image_data[np.newaxis, np.newaxis]
boxes_data = np.asarray([[0, 0, 2, 2]], dtype=np.float32)
box_index_data = np.asarray([0], dtype=np.int32)

image_torch = to_varabile(image_data, requires_grad=True, is_cuda=is_cuda)
boxes = to_varabile(boxes_data, requires_grad=False, is_cuda=is_cuda)
box_index = to_varabile(box_index_data, requires_grad=False, is_cuda=is_cuda)

# set transform_fpcoor to False is the crop_and_resize
roi_align = RoIAlign(3, 3, transform_fpcoor=False)
print(roi_align(image_torch, boxes, box_index))

output:

(0 ,0 ,.,.) = 
  0  1  2
  0  1  2
  0  1  2
[torch.cuda.FloatTensor of size 1x1x3x3 (GPU 0)]

If use RoIAlign in this implimentation:

# input
...
boxes_data = np.asarray([[0, 0, 3, 3]], dtype=np.float32)
...
roi_align = RoIAlign(3, 3, transform_fpcoor=True)
print(roi_align(image_torch, boxes, box_index))

output:

Variable containing:
(0 ,0 ,.,.) = 
  0  1  2
  0  1  2
  0  1  2
[torch.FloatTensor of size 1x1x3x3]

You can read more about the roialign here: https://github.com/ppwwyyxx/tensorpack/blob/6d5ba6a970710eaaa14b89d24aace179eb8ee1af/examples/FasterRCNN/NOTES.md https://github.com/ppwwyyxx/tensorpack/blob/6d5ba6a970710eaaa14b89d24aace179eb8ee1af/examples/FasterRCNN/model.py#L316

mks0601 commented 6 years ago

Oh, I misunderstood the code of yours. Thanks for clarifying. However, it is hard for me to understand the link you provided :(

Can you help me to understand the link you provided? I read the NOTES.md, however I cannot understand why crop_and_resize is different from roi_align except the input form (normalized/unnormalized coordinates?). Also, I cannot understand the code.

If I just set the boxes_data as [xmin, ymin, xmax+1, ymax+1] and set transform_fpcoor=True, then it seems works well so far. And can you let me know 'just set the boxes_data as [xmin, ymin, xmax+1, ymax+1]' is correct? Do the fpcoord stand for feature plane corodinates?

longcw commented 6 years ago

Crop_and_resize (bilinear sample assumes floating point coordinate (0.0, 0.0) is the same as pixel value (0, 0): crop_and_resize

RoIAlign: split the RoI into crop_size grids with the same size first, then bilinear sample the value for each grid: roi_align

To use crop_and_resize for RoIAlign, we shift the grids with -0.5: roi_align_shifted

In your case, the crop is

Variable containing:
(0 ,0 ,.,.) = 
  0.0000  0.0000  0.0000
  0.0000  0.5000  1.1667
  0.0000  0.5000  1.1667
[torch.FloatTensor of size 1x1x3x3]

if you set bbox=[0, 0, 2, 2]: roi_align_2

mks0601 commented 6 years ago

Great help. Thank you. So the difference arises from dividing roi into grids. Then, I think just using [xmin, ymin, xmax+1, ymax+1] can output desired value where each values are float coordinates. Is that right?

mks0601 commented 6 years ago

Sorry, but I still cannot understand the code. What is the input of your roi_align module? Let`s say the bounding box coordinate of roi is (xmin, ymin, xmax, ymax). Then, what is the input of your roi_align module?

spacing_w is function of x1-x0, not x1-x0+1. So, I think xmax and ymax should be ++. Also, I cannot understand why we have to subtract 0.5.

what if just

x0, y0, x1, y1 = tf.split(boxes, 4, axis=1)

nx0 = x0 / tf.to_float(image_shape[1] - 1) ny0 = y0 / tf.to_float(image_shape[0] - 1)

nx1 = x1 / tf.to_float(image_shape[1] - 1) ny1 = y1 / tf.to_float(image_shape[0] - 1)

return tf.concat([ny1, nx1, ny1, nx1], axis=1)

and transform_fpcoor = False?

tensorboy commented 6 years ago

Hi, @mks0601, you may try how to use it for MASK-RCNN at here: https://github.com/tensorboy/Pytorch_Mask_RCNN. :)

fitsumreda commented 6 years ago

@tensorboy i couldn't access the link. Could you share a working link?

pachiko commented 5 years ago

Crop_and_resize (bilinear sample assumes floating point coordinate (0.0, 0.0) is the same as pixel value (0, 0): crop_and_resize

RoIAlign: split the RoI into crop_size grids with the same size first, then bilinear sample the value for each grid: roi_align

To use crop_and_resize for RoIAlign, we shift the grids with -0.5: roi_align_shifted

In your case, the crop is

Variable containing:
(0 ,0 ,.,.) = 
  0.0000  0.0000  0.0000
  0.0000  0.5000  1.1667
  0.0000  0.5000  1.1667
[torch.FloatTensor of size 1x1x3x3]

if you set bbox=[0, 0, 2, 2]: roi_align_2

The pictures are missing... Would be great if you can reupload them :)