mlfoundations / open_flamingo

An open-source framework for training large multimodal models.
MIT License
3.68k stars 277 forks source link

Evaluation on ImageNet1k seems very slow. #267

Closed weivision closed 1 year ago

weivision commented 1 year ago

How long does it take to evaluate the results of imagenet1k? I ran the eval script for ImageNet1k with two A100-40GB GPUs. It took 13 hours for 383 batches (124s per batch).

i-gao commented 1 year ago

Hi! Evaluation takes me ~98 seconds per batch @ fp16 (without prompt ensembling or RICES), so that sounds reasonable.

ImageNet eval is bottlenecked by the need to iterate through all 1K classes to collect logprobs for each class. To help with the speed, we've already implemented caching the vision + language prompts as a speed-up.

If you're doing model development (and not evaluation), it's helpful to subset the val set with the --num_samples flag; e.g., for reference, DeepMind's Flamingo paper notes that they use 10 imgs / class (10K samples).

weivision commented 1 year ago

Thank you for your feedback.