Closed weivision closed 1 year ago
Hi! Evaluation takes me ~98 seconds per batch @ fp16 (without prompt ensembling or RICES), so that sounds reasonable.
ImageNet eval is bottlenecked by the need to iterate through all 1K classes to collect logprobs for each class. To help with the speed, we've already implemented caching the vision + language prompts as a speed-up.
If you're doing model development (and not evaluation), it's helpful to subset the val set with the --num_samples
flag; e.g., for reference, DeepMind's Flamingo paper notes that they use 10 imgs / class (10K samples).
Thank you for your feedback.
How long does it take to evaluate the results of imagenet1k? I ran the eval script for ImageNet1k with two A100-40GB GPUs. It took 13 hours for 383 batches (124s per batch).