Generative models - Githubissues

ashvardanian commented 9 months ago

UForm is going Generative!

The UForm family of tiny multimodal transformer models just got bigger! In addition to the existing CLIP-like embedding models, we now have a generative model useful for image captioning, visual question answering, and multimodal chats. All that is in just a billion parameters, small enough to fit even on mobile devices 🎉

Repository: https://github.com/unum-cloud/uform Generative model: https://huggingface.co/unum-cloud/uform-gen Chat model: https://huggingface.co/unum-cloud/uform-gen-chat

Evaluation Metrics

Unum UForm Gen_ interior

Being the smallest model of its kind, unum-cloud/uform-gen is hard to compare to others. Next in size are the 5x larger LLaVAs and InstructBLIP, with 7 billion parameters. LLaVA performs noticeably better on VQAv2: 78.5 vs 66.5. On captioning, CLIPScore and RefCLIPScore are relatively close across all models.

Model	Size	Caption Length	CLIPScore	RefCLIPScore
`llava-hf/llava-1.5-7b-hf`	7B	Long	0.878	0.529
`llava-hf/llava-1.5-7b-hf`	7B	Short	0.886	0.531

`Salesforce/instructblip-vicuna-7b`	7B	Long	0.902	0.534
`Salesforce/instructblip-vicuna-7b`	7B	Short	0.848	0.523

`unum-cloud/uform-gen`	1.5B	Long	0.847	0.523
`unum-cloud/uform-gen`	1.5B	Short	0.842	0.522

`unum-cloud/uform-gen-chat`	1.5B	Long	0.860	0.525
`unum-cloud/uform-gen-chat`	1.5B	Short	0.858	0.525

Throughput

On RTX 3090, using vanilla PyTorch for inference, with bfloat16 arithmetic and greedy decoding, one should expect the following numbers for throughput.

Model	Size	Speed	Speedup
`llava-hf/llava-1.5-7b-hf`	7B	~ 40 tokens/second
`Salesforce/instructblip-vicuna-7b`	7B	~ 40 tokens/second
`unum-cloud/uform-gen`	1.5B	~ 140 tokens/second	x 3.5

lin72h commented 9 months ago

Very impressive for 1.5B Model, what's the license for it?

ashvardanian commented 9 months ago

Thank you, @lin72h! It’s Apache 2.0, like the rest.

ashvardanian commented 9 months ago

:tada: This PR is included in version 1.0.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

unum-cloud / uform

Generative models #53

UForm is going Generative!

Evaluation Metrics

Throughput