zhihou7 / BatchFormer

CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522
246 stars 20 forks source link

About shared classifier #3

Closed michaelzfm closed 2 years ago

michaelzfm commented 2 years ago

Thank you for your sharing. It seems like you didn't use the auxiliary classifier in RIDE. I just want to know how do you train the auxiliary classifier and make it work in testing phase to eliminate the gap between training and testing, it's optimized with CNN and encoder? or it's not optimized?

zhihou7 commented 2 years ago

Actually, I do not figure out why RIDE does not require a shared classifer. It might be because of the three diverse heads in RIDE.

It is optimized with CNN and encoder in an end-to-end way. I think the shared classifier is key point.

WoWeRtc commented 2 years ago

@zhihou7 thanks for the great work. After reading your paper, I am still a bit confused about the introduction of this auxiliary classifier. Why can’t we use the trained BatchFormer during testing?

zhihou7 commented 2 years ago

Hi, @WoWeRtc Thanks for your interest. We can actually construct a min-batch during testing. However, this might limit the application because we might have only a single sample during inference. Though we might store the training features for building a mini-batch during testing, it will incur additional memory costs. From that point, it is also unfair to compare with current works under zero-shot/long-tailed learning.

By the way, we aim to not only mine the sample relationship via graph/transformer, but also enable the backbone to explore the sample relationship.

Feel free to comment if you have further questions.

WoWeRtc commented 2 years ago

@zhihou7 thanks for the reply, it makes sense to me, however, have you tried using BatchFormer under the case where you can construct minibatch during testing?

zhihou7 commented 2 years ago

Yes, If I remove the shared classifier, the experiment with BatchFormer during testing will achieve clearly better performance compared to the one without BatchFormer, and achieve better performance without BatchFormer during training and testing. With the shared classifier, the two experiments usually achieve similar results.

WoWeRtc commented 2 years ago

Thanks so much for the answers.