longcw / RoIAlign.pytorch

RoIAlign & crop_and_resize for PyTorch
555 stars 103 forks source link

What is the advantage of transform_fpcoor=True as opposed to False? #19

Open lozino opened 5 years ago

lozino commented 5 years ago

I've run some tests in order to try and understand how exactly this implementation works.

Let's suppose my features are as follows:

input:
tensor([[[[0., 1., 2., 3., 4., 5., 6.],
          [0., 1., 2., 3., 4., 5., 6.],
          [0., 1., 2., 3., 4., 5., 6.],
          [0., 1., 2., 3., 4., 5., 6.],
          [0., 1., 2., 3., 4., 5., 6.],
          [0., 1., 2., 3., 4., 5., 6.]]]])

and I want to apply RoIAlign (crop size 3x3) for a box having the following coordinates:

box:
tensor([[2., 2., 4., 4.]])

Now, the result I obtain with transform_fpcoor=False is as follows:

tensor([[[[2., 3., 4.],
          [2., 3., 4.],
          [2., 3., 4.]]]])

which makes sense, since I'm cropping a 3x3 box that is aligned with the coordinates of the input. What I don't quite understand is why the result of the RoIAlign (transform_fpcoor=True) is supposed to be like this:

tensor([[[[1.8333, 2.5000, 3.1667],
          [1.8333, 2.5000, 3.1667],
          [1.8333, 2.5000, 3.1667]]]])

In particular: 1) why are values fetched outside the box (2, 2, 4, 4)? (Notice the values < 2) 2) why does the interpolation seem to stop earlier than it's supposed to be? (Notice the max value is slightly above 3). It seems like the box it tries to extract the coordinates from is (2, 2, 3, 3). This is confirmed by the fact that if I run RoIAlign (fpcoor=True) on the box (2, 2, 5, 5) I get:

tensor([[[[2., 3., 4.],
          [2., 3., 4.],
          [2., 3., 4.]]]])

Could you please explain to me why the RoiAlign behaves like this? Thank you!