Closed michaelzfm closed 2 years ago
Actually, I do not figure out why RIDE does not require a shared classifer. It might be because of the three diverse heads in RIDE.
It is optimized with CNN and encoder in an end-to-end way. I think the shared classifier is key point.
@zhihou7 thanks for the great work. After reading your paper, I am still a bit confused about the introduction of this auxiliary classifier. Why can’t we use the trained BatchFormer during testing?
Hi, @WoWeRtc Thanks for your interest. We can actually construct a min-batch during testing. However, this might limit the application because we might have only a single sample during inference. Though we might store the training features for building a mini-batch during testing, it will incur additional memory costs. From that point, it is also unfair to compare with current works under zero-shot/long-tailed learning.
By the way, we aim to not only mine the sample relationship via graph/transformer, but also enable the backbone to explore the sample relationship.
Feel free to comment if you have further questions.
@zhihou7 thanks for the reply, it makes sense to me, however, have you tried using BatchFormer under the case where you can construct minibatch during testing?
Yes, If I remove the shared classifier, the experiment with BatchFormer during testing will achieve clearly better performance compared to the one without BatchFormer, and achieve better performance without BatchFormer during training and testing. With the shared classifier, the two experiments usually achieve similar results.
Thanks so much for the answers.
Thank you for your sharing. It seems like you didn't use the auxiliary classifier in RIDE. I just want to know how do you train the auxiliary classifier and make it work in testing phase to eliminate the gap between training and testing, it's optimized with CNN and encoder? or it's not optimized?