Closed ashvardanian closed 9 months ago
Very impressive for 1.5B Model, what's the license for it?
Thank you, @lin72h! Itβs Apache 2.0, like the rest.
:tada: This PR is included in version 1.0.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
UForm is going Generative!
The UForm family of tiny multimodal transformer models just got bigger! In addition to the existing CLIP-like embedding models, we now have a generative model useful for image captioning, visual question answering, and multimodal chats. All that is in just a billion parameters, small enough to fit even on mobile devices π
Repository: https://github.com/unum-cloud/uform Generative model: https://huggingface.co/unum-cloud/uform-gen Chat model: https://huggingface.co/unum-cloud/uform-gen-chat
Evaluation Metrics
Being the smallest model of its kind,
unum-cloud/uform-gen
is hard to compare to others. Next in size are the 5x larger LLaVAs and InstructBLIP, with 7 billion parameters. LLaVA performs noticeably better on VQAv2: 78.5 vs 66.5. On captioning, CLIPScore and RefCLIPScore are relatively close across all models.llava-hf/llava-1.5-7b-hf
llava-hf/llava-1.5-7b-hf
Salesforce/instructblip-vicuna-7b
Salesforce/instructblip-vicuna-7b
unum-cloud/uform-gen
unum-cloud/uform-gen
unum-cloud/uform-gen-chat
unum-cloud/uform-gen-chat
Throughput
On RTX 3090, using vanilla PyTorch for inference, with
bfloat16
arithmetic and greedy decoding, one should expect the following numbers for throughput.llava-hf/llava-1.5-7b-hf
Salesforce/instructblip-vicuna-7b
unum-cloud/uform-gen