Open andsteing opened 2 years ago
Hi,
There can be numerical differences that we cannot fully control, e.g. different CUDA and driver versions, batch sizes, hardware, etc., that may cause the 0.5% difference in evals.
That being said, have you tried with the 18 prompts in this document?
Hi,
There can be numerical differences that we cannot fully control, e.g. different CUDA and driver versions, batch sizes, hardware, etc., that may cause the 0.5% difference in evals.
That being said, have you tried with the 18 prompts in this document?
Hello,
I would be grateful if the prompt and other settings related to the zero-shot result on Cifar-100, Cifar-10 can be revealed.
In research, we would like to know if our experimental settings are correct and optimal, otherwise reviewers could challenge this.
Regards
Here is the best results I obtained:
image encoder: ViT-B/16 prompt: "itap of a {label}."
Dataset | Reproduced Acc. | Reported Acc. | Gap |
---|---|---|---|
CIFAR-10 | 90.51 | 91.6 | 1.09 |
CIFAR-100 | 68.03 | 68.7 | 0.67 |
Hi
Thanks so much for providing this repository and the notebooks!
I'm debugging diffs in the zeroshot evaluation results from a JAX port of this repository (
scenic.projects.baselines.clip
) and as part of this work I'm trying to reproduce the exact numbers published in the paper https://arxiv.org/abs/2103.00020 (Table 11).I created a short Colab based on the provided notebooks where I'm zeroshot-evaluating CLIP models on the CIFAR100 dataset: https://colab.research.google.com/github/andsteing/CLIP/blob/zeroshot/notebooks/zeroshot_evaluation.ipynb#scrollTo=Mo-MYo3Flgth
I get the following results:
So the results are still about 0.5% short from what I would expect after reading the paper.
Any idea what this small difference could be due to?
Best, Andreas