magicleap / SuperGluePretrainedNetwork

SuperGlue: Learning Feature Matching with Graph Neural Networks (CVPR 2020, Oral)
Other
3.31k stars 672 forks source link

DO NMS before or after keypoints are dectected, what's the differences? #112

Open Vincentqyw opened 2 years ago

Vincentqyw commented 2 years ago

Hi, I notice that there are slightly differences between Magicleap SuperPoint implementation by Daniel and this repo. In this repo, nms (defined by a max_pooling kernel) is performed on heatmap/scores matrix (https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/models/superpoint.py#L167), but in Daniel's version, nms is perform on keypoints locations(https://github.com/magicleap/SuperPointPretrainedNetwork/blob/master/demo_superpoint.py#L258). I can't figure out why you make this revision and what's the differences of the final keypoint localtion? Thanks for your reply:) @Skydes

sarlinpe commented 2 years ago

Hi @Vincentqyw, good question! The two approaches produce nearly-identical results but the implementation of this repo:

The whole forward pass of SuperPoint is actually batched, so we used this implementation of SuperPoint in our training pipeline, detecting & describing data-augmented images on the fly.

Vincentqyw commented 2 years ago

Excellent! Thx again for your prompt reply!

javierttgg commented 2 years ago

Hi @Skydes, if I may ask something related, why should points that are not initially detected as local maxima in max_mask = scores == max_pool(scores) be added in max_mask = max_mask | (new_max_mask & (~supp_mask))?: https://github.com/magicleap/SuperGluePretrainedNetwork/blob/ddcf11f42e7e0732a0c4607648f9448ea8d73590/models/superpoint.py#L55-L62 I'm struggling to see the intuition behind this decision. I believe this is also a difference with respect to the Daniel's Superpoint implementation. Thanks in advance! :)

sarlinpe commented 2 years ago

The initialization of L56 is too conservative as it doesn't account for chains - consider a point A that suppresses B, which itself suppresses C. Daniel's original implementation would retain A and C if they are sufficiently far away from each other, while L56 would retain A only. By removing suppressed pixels supp_mask in supp_scores and propagating scores for a few iterations, we recover points like C. This is a very good approximation and much faster than the original iterative implementation.

javierttgg commented 2 years ago

Understood, thanks a lot @Skydes

javierttgg commented 2 years ago

Sorry to bother again @Skydes, I tried a little bit the code and I detected a possible undesired result.

This way of performing NMS can produce false positives. See for for instance this 1d example with a nms radius of 2:

image

In this case the false positive is 4.

A more complete example with nms_radius = 4:

image

[Code for reproduction] ```python import torch import matplotlib.pyplot as plt def simple_nms(scores, nms_radius: int): """ Fast Non-maximum suppression to remove nearby points """ assert(nms_radius >= 0) def max_pool(x): return torch.nn.functional.max_pool2d( x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius) zeros = torch.zeros_like(scores) max_mask = scores == max_pool(scores) for _ in range(2): supp_mask = max_pool(max_mask.float()) > 0 supp_scores = torch.where(supp_mask, zeros, scores) new_max_mask = supp_scores == max_pool(supp_scores) max_mask = max_mask | (new_max_mask & (~supp_mask)) return torch.where(max_mask, scores, zeros) if __name__ == "__main__": nms_radius = 4 scores = torch.tensor( [[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]] ) scores = torch.tensor( [[[[.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0], [.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0], [.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0], [.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0], [.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1], [.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0], [.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0], [.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0], [.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0]]]] ) nms = simple_nms(scores, nms_radius) fig, axes = plt.subplots(ncols=3) titles = ['scores', 'expected', 'actual result'] axes[0].imshow( scores.numpy()[0,0], cmap='jet', vmin=0., vmax=0.9) axes[1].imshow( (scores == 0.9).float()[0,0].numpy(), cmap='Greens', vmin=0, vmax=1) axes[2].imshow( (nms > 0).float()[0,0].numpy(), cmap='Greens', vmin=0, vmax=1) for axi, title in zip(axes, titles): axi.set(title=title) fig.tight_layout() ```

I believe this issue can be corrected easily by checking with a nms_radius=1 if the new detected keypoints are local maxima i.e.:

def simple_nms(scores, nms_radius: int):
    """ Fast Non-maximum suppression to remove nearby points """
    assert(nms_radius >= 0)

    def max_pool(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius)

    # ❕ added function  
    def max_pool_r1(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=3, stride=1, padding=1)

    zeros = torch.zeros_like(scores)
    max_mask = scores == max_pool(scores)
    max_mask_r1 = scores == max_pool_r1(scores) # ❕ added line
    for _ in range(2):
        supp_mask = max_pool(max_mask.float()) > 0
        supp_scores = torch.where(supp_mask, zeros, scores)
        new_max_mask = supp_scores == max_pool(supp_scores)
        max_mask = max_mask | (new_max_mask & (~supp_mask) & max_mask_r1) # ❕ modified line
    return torch.where(max_mask, scores, zeros)

New result:

image

As a sidenote, I feel that this issue is not very likely to happen or very important (the proof is that SuperGlue works great :) ). Nevertheless this may help a bit (if deemed useful I can submit a PR).

On the other hand, I feel that it may not be sensible to introduce this change with the actual weights of SuperGlue. After this change, the detected keypoint patterns of Superpoint may change a bit. So, at least, it should be tested first.

sarlinpe commented 2 years ago

Good catch, thank you for the detailed analysis and minimal example with code. This should ideally be fixed, but I indeed expect this to not make much difference. I'll run some evaluations when I find the time.