microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.28k stars 2.56k forks source link

Kosmos-2 batch modality and processing speed #1604

Open unrue opened 4 months ago

unrue commented 4 months ago

I'm using Kosmos-2 referring to

https://huggingface.co/microsoft/kosmos-2-patch14-224

using a simple prompt "An image of".

The process works well but it is quite slow. Over 1000 images, it requires 3 hours and half on Nvidia A100 GPU. Is it possible to enable batch modality? At the moment, one image per time is processed.

Other hints to improve processing speed? Thanks.