Question about baseline results in Tab 2

xmed-lab / CLIPN

ICCV 2023: CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

MIT License

120 stars 11 forks source link

Question about baseline results in Tab 2 #5

Closed Z-ZHHH closed 10 months ago

Z-ZHHH commented 11 months ago

Appreciate your impressive work. In the table 2 of the main paper, is the MSP, MaxLogit results reproduced on CLIP or CLIPN? I test the MaxLogit on CLIP (VitB-32) on CIFAR100 (id) and CIFAR10 (ood), but only get 74.8%AUROC.

SiLangWHL commented 10 months ago

Sorry for the late reply. The results of MSP and MaxLogit in Table 2 are produced by the original CLIP model. You can check the new-uploaded "./handcrafted/src/zero_shot_infer.py" for hand-crafted CLIPN. It can produce the proper results on CIFAR.

Z-ZHHH commented 10 months ago

Thanks for your reply. I find different implementation may superly affect the OOD detection performance, while the ID classification performance is similar.

Z-ZHHH commented 10 months ago

It seems that in the file./handcrafted/src/zero_shot_infer.py These lines, the model is the CLIPN model as CLIP do not have ckpt with epoch num.

SiLangWHL commented 10 months ago

That’s because there is a hyper parameter temperature. I just use the final learned one, 100 instead of manually finding best one on test set for different methods and datasets. Besides, whether using L2 normalization can also change the performance. Unfortunately, I failed to find a consistent conclusion of determining the above two factors on different methods and datasets. As a result, a general and fair way is to follow the original operation of the CLIP.

SiLangWHL commented 10 months ago

The image encoder and text encoder of CLIPN is the same as CLIP. We freeze them when training no text encoder. That means you can find the same CLIP model from CLIPN models at different epochs.

Z-ZHHH commented 10 months ago

Thanks for your quick and detailed reply! I implemented the baseline methods (MSP, MaxLogit, Energy) with MCM repo on CIFAR-100 and ImageNet, and the results were kind of different from your reported results on CIFAR-100. Specifically speaking, these are the difference (avg. on the OOD datasets): our implementation with MCM repo	Vit-B32 MCM	MaxLogit	MSP	Energy
CIFAR-100	79.5	72.5	77.3
ImageNet-1k	84.9	78.9	82.6

reported results in CLIPN	Vit-B32 CLIPN	MaxLogit	MSP	Energy
CIFAR-100	84.3	81.8	82.0
ImageNet-1k	85.6	73.3	84.9

This could be a question for future discussion about the potential reasons for these differences when using the CLIP model.

SiLangWHL commented 10 months ago

Sure. I am also working on improving the robustness of CLIP-based OOD detection. If you have further question, feel free to discuss with me.