raoyongming / CAL

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
MIT License
144 stars 24 forks source link

Output Feature During Inference Stage #1

Closed morgenzhang closed 3 years ago

morgenzhang commented 3 years ago

I quickly checked the model script baseline.py and found that you used the cls_score as output when doing inference. I am wondering if your published results were generated by this instead of features before classifier (which is regularly applied in popular reid framework).

CHENGY12 commented 3 years ago

Thanks for your question! Yes. We use the cls_score as the final person feature to calculate the distance during inference, for both CAL and base model. It is because the effect of counterfactual attention intervention is on the cls_score. Besides, we find using cls_score can bring better performance on MSMT17( 79.7->81.4 for the baseline; and 80.4->84.2 for CAL ), and use it on all datasets.

morgenzhang commented 3 years ago

Thanks for your explanation. However, it's still confused to me when using the predicted classification possibility as ID representation for ReID tasks. Unlike normal image classification, Person ReID is known as open set problem. So the ID appears in traning set will not be visible during inference. Although we can still use cls_score to calculate the distance between query-gallery pairs, such kind of ID-related vectors may not be reasonable for these unknown ones. In addition to cls_score, I think it's also interesting to consider the influence of attention on metric learning, e.g. triplet loss.

CHENGY12 commented 3 years ago

Thank you. The “cls_score” feature is similar to the feature learned by global_feat + FC. Many existing methods use the FC layer following the backbone CNN network, such as VGG. In my opinion, using “cls_score” as a feature makes sense. The key of the open set problem is that the testing IDs are different from the training IDs but not whether the training IDs are visible during inference. There is a possible explanation that the “cls_score” uses the observed person to describe the unseen persons in the testing environment. For a "new" person in testing datasets, we can use the cls_score to describe him/her as he/she is similar to A in 0.4 and to B in 0.2. It is also a reasonable description of the person since humans also describe an unseen person by he is similar to my old friend A. What's more, the performance with “cls_score” is higher than global_feat. We believe that exploring the reason behind this phenomenon is very very interesting. Besides, we also think it is interesting to consider the influence of attention on metric learning. We also make efforts in this field (please see another work in https://arxiv.org/abs/2108.05889). However, using causal inference in metric learning is not easy since the effect of treatment on the treated is difficult to evaluate.

morgenzhang commented 3 years ago

Your explanation about "cls_score" is quite convincing! And yes, I am also interested at such experiment phenomenon and am planning to do some extened work on my own dataset (just a little bit unpleased since the final feature dim is equal to the training ID number which could be considerably large for some wild datasets). I will also check the paper you provided. Thanks a lot for your reply which greatly inspires me!