microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.5k stars 2.48k forks source link

1 Click Windows, RunPod & Linux Installer for Kosmos-2 with Batch Image captioning feature - not an issue #1461

Open FurkanGozukara opened 6 months ago

FurkanGozukara commented 6 months ago

After working whole day I finally coded it and published it.

Also I found a better prompt that captions better.

You can download auto installer at here : https://www.patreon.com/posts/90744385

The batch image captioning models we have right now as follows:

CogVML with quantization 4-bit, 8-bit, 16-bit LLaVA including 34b with quantization such as 4-bit, 8-bit, 16-bit Blip2 Models Clip Vision Models Kosmos-2 Model Kosmos-2 supports both single image captioning and also batch image captioning. I also did some research to find a good prompt.

1 click to install both on Windows, RunPod & Linux.

Generates its own venv so will never conflict with no any other app you have.

Here news about them : https://www.patreon.com/posts/sota-image-model-98499462

kosmos2_2

kosmos2

scripts_arsenal_full_screenshot

FurkanGozukara commented 6 months ago

Just made new update

16 February 2024 Update: 4-bit, 8-bit, 16-bit and 32-bit loading options added to the Kosmos-2 I think Kosmos-2 is the very best caption model if you have a lower VRAM GPU 4-bit of Kosmos 2 only uses 2 GB VRAM and 32-bit uses only 7.5 GB VRAM Please reinstall with newest Kosmos-2_v2.zip