naver / r2d2

Other
461 stars 86 forks source link

when is the loss_from_ap in the ReliabilityLoss class called ? #35

Closed hichem-abdellali closed 3 years ago

hichem-abdellali commented 3 years ago

Hi,

I am wondering when is the loss_from_apin the ReliabilityLoss class called and why there is a minus in the equation instead of a plus, as in (eq 5) in the paper,

56. return 1 - ap*rel - (1-rel)*self.base

also could you please provide details about the input parameters and functioning of the NghSampler2 function Thanks

jerome-revaud commented 3 years ago

Hi @djidan10

So, first there is a mistake in the paper! The correct equation is the one in the code. This has already been discovered by people in previous issues, and I deeply apologize for that.

Second, the ReliabilityLoss class inherits from the PixelAPLoss class. So, basically, they share the same functions. Otherwise said, ReliabilityLoss has the same forward() function than PixelAPLoss except that it is calling the loss_from_ap() from ReliabilityLoss.

hichem-abdellali commented 3 years ago

Thank you @jerome-revaud for your reply. One last question, I have seen that the flow works from img1 to im2, so is this correct? aflow: 'absolute' optical flow = (x,y) position of each pixel from img1 in img2

and why are their 2 aflow ? is the second one (from img2 to img1) not used at all?

Thanks

jerome-revaud commented 3 years ago

Yes, the flow works from img1 to img2.

Where is there 2 aflow? Sorry I'm not sure as to which part of the code you are referring to. If you refer to the annotations, then it serves two purposes. First, using forward+backward flows is used to double-check errors in the optical flow (normally, this should be the identity). Second, the two images are randomly ordered as [img1,img2,aflow1to2] or [img2,img1,aflow2to1] during training. (It was quite a long time ago that i wrote the code for all this, so I'm not 100% sure about it, but that should be something like that)

hichem-abdellali commented 3 years ago

Thanks a lot for your help, any details about NghSampler2 ? it would be appreciated if you could provide some general details about it.

jerome-revaud commented 3 years ago

Well it is true that this is the most complicated part in the code.

The goal of the sampler is to select a small number of patch correspondences between two images img1 and img2 related by known optical flow aflow. In the paper, it is written that we use all possible pixels, but in practice this is too slow and memory-demanding, so we only use a subset of all pixels.

Now, the question is: which subset of pixels are we going to select?

There are 2 things to consider:

  1. easy negatives are more likely to be far from the ground-truth positive
  2. hard negatives are more likely to be close to the ground-truth positive

So ideally we want to mix them. And that's what is performed by NghSampler2(ngh=7, subq=-8, subd=1, pos_d=3, neg_d=5, border=16, subd_neg=-8, maxpool_pos=True):

  1. we first decide on the location of the query pixels in img1. They are either regularly or randomly sampled. In the code, this is done here using gen_grid(step=-8). Note that the self.sub_q=-8 meaning it's going to be a random sampling (because negative) that would yield the same amount of pixels as if we had regularly sampled pixels on a 8 pixels grid. For instance, if the image is 256x256, this would return 256x256/(8x8) = 1024. (Well, in fact we also prevent pixels near the border=16 to be selected, so in fact it's less than that.)
  2. we then compute the position of their positive patch counterpart in img2 using aflow here
  3. for each query, we then sample all pixels nearby to the ground-truth position position in img2 here. Nearby means less than self.pos_d=3 pixels.
  4. we select the one with the highest similarity with the query as the true positive and discard the others here
  5. we then sample hard-negatives in img2 also in the neighborhood of the ground-truth positive here. The rule is that that are at a distance beween self.neg_d=5 and self.ngh=7 pixels from the ground-truth positive.
    1. we finally sample far-away negatives on a regularly-spaced or randomly-spaced grid in img2. We again use the gen_grid(step=-8) function here with a grid step of self.sub_d_neg=-8 and proceed as before, this time making sure that we don't sample positives by mistakes.

And that's it: for each query pixels in img1, we have selected in img2 1 true positive, a small number of hard-negatives which are close to the true positive, and a larger number of easy-negatives which are far-away.

hichem-abdellali commented 3 years ago

Hi @jerome-revaud , Thanks a lot for the detailed explanation. There is one more thing which is still not clear for me, compute_AP, the function is using a quantizer with a conv1d, my question is: how to interpret it and what is the goal of going from one channel to Mx2 (I see that M was set to 20) for example if the input is 100x100 the quantizer provides us 100x40x100, and after that, you are taking the minimum of the two parts [0:19] and [20:39] so what is the meaning of that? Thanks in advance

jerome-revaud commented 3 years ago

So this concerns the soft-binning stage. Specifically, score are softly-quantified into soft bins using a triangular kernel (see the original paper for more details). Each triangle in the kernel is constituted by 2 lines, so if you have 20 bins there would be 40 lines.

For instance, imagine a single triangle from (-1,0) to (0,1) and then to (1,0). There are two lines y1 = 1+x and y2=1-x, and the kernel function is obtained as y=max(0, min(1+x,1-x)). (hence the min operation, and there's a clipping to 0 also).