Zeroshot evaluation CIFAR100 in Colab: Results ~0.5% below values reported in paper

andsteing commented 2 years ago

Hi

Thanks so much for providing this repository and the notebooks!

I'm debugging diffs in the zeroshot evaluation results from a JAX port of this repository (scenic.projects.baselines.clip) and as part of this work I'm trying to reproduce the exact numbers published in the paper https://arxiv.org/abs/2103.00020 (Table 11).

I created a short Colab based on the provided notebooks where I'm zeroshot-evaluating CLIP models on the CIFAR100 dataset: https://colab.research.google.com/github/andsteing/CLIP/blob/zeroshot/notebooks/zeroshot_evaluation.ipynb#scrollTo=Mo-MYo3Flgth

I get the following results:

model	dataset	7 prompts	80 prompts	table 11
RN50	CIFAR100	40.93	41.04	41.6
B/32	CIFAR100	64.58	64.21	65.1

So the results are still about 0.5% short from what I would expect after reading the paper.

Any idea what this small difference could be due to?

Best, Andreas

jongwook commented 2 years ago

Hi,

There can be numerical differences that we cannot fully control, e.g. different CUDA and driver versions, batch sizes, hardware, etc., that may cause the 0.5% difference in evals.

That being said, have you tried with the 18 prompts in this document?

Calmepro777 commented 1 year ago

Hi,

There can be numerical differences that we cannot fully control, e.g. different CUDA and driver versions, batch sizes, hardware, etc., that may cause the 0.5% difference in evals.

That being said, have you tried with the 18 prompts in this document?

Hello,

I would be grateful if the prompt and other settings related to the zero-shot result on Cifar-100, Cifar-10 can be revealed.

In research, we would like to know if our experimental settings are correct and optimal, otherwise reviewers could challenge this.

Regards

Calmepro777 commented 1 year ago

Here is the best results I obtained:

image encoder: ViT-B/16 prompt: "itap of a {label}."

Dataset	Reproduced Acc.	Reported Acc.	Gap
CIFAR-10	90.51	91.6	1.09
CIFAR-100	68.03	68.7	0.67

openai / CLIP

Zeroshot evaluation CIFAR100 in Colab: Results ~0.5% below values reported in paper #204