yewzijian / RPMNet

RPM-Net: Robust Point Matching using Learned Features (CVPR2020)
MIT License
328 stars 60 forks source link

Questions about the inlier loss #32

Open qsisi opened 2 years ago

qsisi commented 2 years ago

Hello! I got a question about the inlier loss as defined in equation-11 in the paper.

In my understanding, the definition of inlier loss in the paper is to average the total summation of the confidence matrix m_jk by J and K to encourage more entries to be labeled as inliers: image

But in the code, the actual computation of the inlier loss is a little bit different: image

As you add an additional scalar one to subtract the sum of each row and column. Why add this? Is it related to the performance of the training process?

That would be so much helpful if you give me some hints about it.

Thanks for your help.

yewzijian commented 2 years ago

Hi qsisi, there's no difference. Adding a scalar won't affect the training. The code is written like this since the original raw thinking was to minimize the slack column/row which is given by 1-sum(m_jk). Subsequently during the review process, we noticed the shift was not needed then we removed it in the paper.

qsisi commented 2 years ago

I totally get your point. Thank you so much.

Also, I would like to ask some questions for myself if you don't mind:

Is the RPMNet the first work to use sinkhorn operations in 3D point matching? As the RPMNet and the SuperGlue both utilize the sinkhorn operations to generate soft assignments through deep 3D & 2D features, and two works are all come from 2020 CVPR. Is it just a coincidence that such an idea gets employed in both 3D and 2D matching? It is an interesting finding for me that both works just get appeared in the same year. Hoping it is not offensive that I just straightly ask here.

Thanks.

yewzijian commented 2 years ago

No, the original robust point matching cited in the paper uses sinkhorn operations. However, to the best of our knowledge, we are the first to use it in the learned registration setting (together with Superglue).

Yes, it's quite a coincidence. I met the author of Superglue during the conference, and both of us were surprised about it :)

qsisi commented 2 years ago

Thank you so much for your explanations.

Also I got several new questions about the Sinkhorn. I'm asking here if you don't mind.

:)

In my understanding, the sinkhorn operation is originally proposed in the 1963 paper ([31] as cited in RPMNet) by sinkhorn himself. And the paper says that we can convert any positive NxN matrix to a doubly stochastic matrix by the sinkhorn operations.

But what does exactly that doubly stochastic matrix mean with respect to the original input positive matrix?

I noticed that there are paper like "Sinkhorn Distances: Lightspeed Computation of Optimal Transport" coming out in 2013, but I didn't get the point by reading it. (hard for me to understand :( )

It would be so much help if you could provide me with some help about it!

yewzijian commented 2 years ago

Doubly stochastic means each row and column sums to one.

qsisi commented 2 years ago

Yes, I totally know that a doubly stochastic matrix means a matrix that each row and column of it sums to one, but the confusing point for me is that the meaning of that output doubly stochastic matrix.

For example, in the optimal transport context, the input matrix is the cost matrix, and the output matrix through sinkhorn operation is the assignment matrix. Then in point matching, the cost matrix becomes the L2 distances computed by deep features, and the output matrix still means the assignment matrix. Am I understand it correctly?

yewzijian commented 2 years ago

Yes your understanding looks right to me.

qsisi commented 2 years ago

Sorry to bother you again, I notice that in superglue, the sinkhorn operation is implemented like this: https://github.com/magicleap/SuperGluePretrainedNetwork/blob/ddcf11f42e7e0732a0c4607648f9448ea8d73590/models/superglue.py#L152 Compared to the implementation in RPMNet, there are a lot of differences, could you give some comments about it?

Thanks.

yewzijian commented 2 years ago

Hi, For my own implementation, I scaled the matrix without the last column (M+1)x(N), or row (M)x(N+1) to get each row/column in the main MxN part of the matrix to sum to <=1. I'm not able to comment much on Predator's implementation. From a quick look, it seems to follow the reasoning in their paper, which normalizes the entire matrix incl. slack row and columns so that it satisfies eq. (9) in their paper.

qsisi commented 2 years ago

Thanks for your answer, although I still could not understand the difference between your implementation compared with the one in SuperGlue.

Here I got a new question :)

In your weighted procrustes implementation:

https://github.com/yewzijian/RPMNet/blob/2cbdfe91d66d2076b0a94d2ee7ff362ba6e272f9/src/models/rpmnet.py#L127

which I think corresponds to the computation: image in https://vincentqin.gitee.io/blogresource-3/slam-common-issues-ICP/svd_rot.pdf

the a_centered and b_centered denote the X and Y respectively, but the weights_normalized here in my understanding corresponds to the W / torch.sum(W) instead of W in the paper? Should the weights_normalized be replaced by weights here?

Thank you very much for your help.

yewzijian commented 2 years ago

It shouldn't matter, and should still give you the same answer.

Zi Jian