Closed Kenneth-Wong closed 1 year ago
Hi @Kenneth-Wong,
Thank you for your interest! The only difference between the two is the assumption of a diagonal covariance matrix (which the simple implementation uses), and is discussed in section C of the appendix of my manuscript. I found that the two had simular performance but the 'Full' head slows down training and makes it less stable.
I began by developing and implementing the 'Full' head and found that it learned a diagonal covariance matrix. This is probably due to stability with the matrix covariance inversion, which required heavy regularization (and, of course, most likely lead to the diagonal covariance matrix). I therefore implemented and used the 'simple' head, and found that both had shorter training time and similar performance (with a more stable training schedule as well) which is why I opted to use that in the end. It still calculates the Mahalanobis distance but under the diagonal covariance assumption (or iid multivariate gaussian in embeddings space).
I opted to include the 'Full' head as well in case anyone wanted to try and use it. I also have some intuition that this head will become more relevant when training on a larger number of classes than on the OWOD benchmarks, but I have never tested this.
I can expand further if you like, Best, Orr
Wow, thanks for your detailed reply! Much helpful for me.
Thanks for your great work and detailed answer. So the Mahalanobis distance in your paper just degenerates into an Euclidean distance during implementation, is that right? :)
You are correct; after batch norm and the IID assumption, all that is left is the Euclidean distance (you can see our Sup. Sec. D, Additional Implementation Details for more).
Ok i got it, much thanks for your reply.
Thanks for your great work.
In the file "prob_deformable_detr.py", there are two classes,
ProbObjectnessHead
andFullProbObjectnessHead
, and the latter one (Full) is not used. But it seems that the full one is more consistent with the paper, while theProbObjectnessHead
is used but is so simple. Do these two classes actually play the same role? Can you expain it more detailedly? Thanks.