The answer is that "Single label like imagenet is not influenced, because architecture surgery is designed for explainability task, and feature surgery compute the redundant feature as a common bias for each class. Thus, giving a same bias doesn’t change the rank and accuracy, instead it influences scores across images and benefits mAP for multi-label".
If you want to test classification mAP, the first step is to record cosine similarity from the original path with feature surgery, and then eval with gt using package like torchmetrics.
This issue has been discussed in #13 .
The answer is that "Single label like imagenet is not influenced, because architecture surgery is designed for explainability task, and feature surgery compute the redundant feature as a common bias for each class. Thus, giving a same bias doesn’t change the rank and accuracy, instead it influences scores across images and benefits mAP for multi-label".
If you want to test classification mAP, the first step is to record cosine similarity from the original path with feature surgery, and then eval with gt using package like torchmetrics.