I was trying to reproduce the zeroshot prediction results listed in Table 11 in the paper. Though I can reproduce most of the results in the Table 11, I found there are huge gaps on EuroSAT dataset.
We have tried:
Use JIT when loading CLIP model or not
Using different image preprocessing, i.e. center-crop or not center-crop
We have confirmed the order of categories in the promts.py is consistent with the dataset
But we can still can not reproduce the reported numbers in table 11. Any hints will be greatly appreciated, thank you!
Model name
ResNet50
ResNet101
RN50x4
RN50x16
ViT-B/16
ViT-B/32
Dataset
CLIP
Ours
Delta
CLIP
Ours
Delta
CLIP
Ours
Delta
CLIP
Ours
Delta
CLIP
Ours
Delta
CLIP
Ours
Delta
EuroSAT
41.1
41.3
0.2
33.1
31.0
-2.1
35.0
32.7
-2.3
40.3
42.0
1.7
54.1
54.6
0.5
49.4
44.8
-4.6
JIT applied or Not when loading CLIP model
Model name
ResNet50
ResNet101
RN50x4
RN50x16
ViT-B/16
ViT-B/32
Dataset
w/ JIT
w/o JIT
Delta
w/ JIT
w/o JIT
Delta
w/ JIT
w/o JIT
Delta
w/ JIT
w/o JIT
Delta
w/ JIT
w/o JIT
Delta
w/ JIT
w/o JIT
Delta
EuroSAT
41.3
41.3
0
31.0
31.0
0
32.6
32.7
-0.1
42.2
42.0
0.2
54.6
54.6
0
44.8
44.8
0
Image Preprocessing: Center Crop v.s. No Center Crop
hi @xcpeng, can you share the code that you use to get this performance. Because it's only 4.42 with B/32 with my code. while others dataset gave the same performance as the table 11
Thank you for your work on CLIP!
I was trying to reproduce the zeroshot prediction results listed in Table 11 in the paper. Though I can reproduce most of the results in the Table 11, I found there are huge gaps on EuroSAT dataset.
We have tried:
But we can still can not reproduce the reported numbers in table 11. Any hints will be greatly appreciated, thank you!
JIT applied or Not when loading CLIP model
Image Preprocessing: Center Crop v.s. No Center Crop
Order of categories