doubts about shared classifier in val phase

Zysty commented 2 years ago

Hi! Thanks for your sharing.

There may be differences between the paper and code. In the paper , it can be seen that "we share the classifier between with or without the BatchFormer during training, which can thus be removed during testing". While in the code, it can be found that the output of val phase, "logits", is the average obtained by the "self.logits" and "logits_old". As a result, It seems like you still use BatchFormer in the val phase. Could you please answer my doubts?

https://github.com/zhihou7/BatchFormer/blob/01f6fc55fcd5834cb0f06082a1fba633e42d9343/long-tailed_recognition/BalancedSoftmax/run_networks.py#L321

zhihou7 commented 2 years ago

Hi @Zysty, Thanks for your questions. It is for the ablation study that we do not use in our main experiment. We actually do not include this ablation study in our paper. You can find there is a condition in L320.

https://github.com/zhihou7/BatchFormer/blob/01f6fc55fcd5834cb0f06082a1fba633e42d9343/long-tailed_recognition/BalancedSoftmax/run_networks.py#L320

The program will execute L321 only when you set eval_batch. "eval_batch" indicates we evaluate the method with a mini-batch. I just want to check the result when I average the features before and after batchformer. Empirically, this kind of evaluation does not improve the performance after we share the classifier during training phrase.

Sorry for the redundant code. I mainly conduct the ablation study, visualized comparison (self.debug) and gradient analysis (self.debug) based on BalancedSoftmax. Therefore, there might be some redundant codes in BalancedSoftmax.

Feel free to post if you have other questions.

Regards,

Zysty commented 2 years ago

Thank you for the prompt reply. So in practice, the "self.logits" is the unique term used in the val phase.

Looking forward to hearing good news from BatchformerV2, V3, and so on. Haha :)

zhihou7 commented 2 years ago

Yes. When I infer the model, I do not use eval_batch. It is just for debugging and ablation study.

Thanks. It might require providing a novel insight for a new work compared to the current work. Otherwise, it mainly presents a generalized version and shows the possibility of new model architectures compared to current work.

zhihou7 commented 2 years ago

Hi @Zysty, Thanks for your questions. I remember I provided the ablation study in the appendix C.2 in BatchFormerV2. Appendix C.2 presents a quantitative illustration.

zhihou7 / BatchFormer

doubts about shared classifier in val phase #10