Some questions regarding the evaluation protocol

Dear @kyungwon213 ,

First of all, thank you for your great work!

Summary of the issue:

Is the setting you refer to as ‘Standard’ setting in fact the ‘Same-Clothes’ setting reported by other works?
Is the Gallery filtering procedure in this work different from the usual approach, where only samples of the same identity as in the current query are filtered?

In more detail, we have some questions regarding the evaluation protocol, and how it works compared to other ReID works.

Regarding the setting you are referring to as ‘Standard’ in the paper. Under this setting, it seems that you are examining only the query samples that have samples in the gallery with the same-clothes. Is this correct? Other ReID models refer to this setting as ‘Same-Clothes’ and present an additional setting (often called ‘General’) that uses all gallery samples without any clothes-based filtering. From the results of previous works you report in the paper, it seems that you are comparing the ‘Same-Clothes’ setting of your model to the ‘General’ setting of other works. This seems like an incorrect comparison, as other works report a higher accuracy for the ‘Same-Clothes’ setting than for the ‘General’ setting. Are we missing something?
Another point regarding the gallery filtering. Usually, during evaluation, the entire gallery is passed to the model, and for a given query sample only the undesired samples of this identity are filtered from the gallery (i.e. same camera and same/different clothes according to the tested setting). For example, under the clothes-changing setting, when examining a query of identity i, only samples of identity i with the same-clothes should be filtered from the gallery. However, when examining your code, it seems that you split the gallery into same-clothes and different-clothes groups in advance and use only these groups during evaluation. This might result in an unfair comparison, as you are using fewer gallery samples, hence having fewer “distracting” samples of different identities. Under the same-clothes setting, we’ve tested a query sample of the LTCC dataset that has only gallery samples with the same-clothes. When filtering the gallery in a similar manner to previous works, it leaves in the gallery 7028 samples out of 7050. However, under your evaluation protocol, the initial gallery size is 1031. As far as we understand according to previous works, in the same clothes-setting only gallery samples of the same identity with different clothes should be removed, while all other gallery samples of different identities should be considered. Can you please explain the logic behind this filtering?

kyungwon213 / AD_ViT

Some questions regarding the evaluation protocol #3