mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
268 stars 27 forks source link

Vram? #3

Closed TingTingin closed 1 year ago

TingTingin commented 1 year ago

How much vram does it take to run this on both the vicuña and flan versions can it be ran on consumer gpus? Also will the trams formers library be supported in the future?

gordonhu608 commented 1 year ago

Thanks for your interest in our work. It takes around 20G to run vicuna (7B) version with fp16 and around 28G for flant5xxl (11B) version. So vicuna version can be fitted into a consumer gpu (24G).

gordonhu608 commented 1 year ago

Another way is try using int8, which can save lots of memory without performance drop.

TingTingin commented 1 year ago

Will bliva support the transformers library in the future?

gordonhu608 commented 1 year ago

Yes, we will support Huggingface model hub soon.

TingTingin commented 1 year ago

is it possible to run this with 8gb vram if so how?

gordonhu608 commented 1 year ago

We are still investigating int8 inference. Our initial attempt seems not working well. Try deepspeed with cpu could be a good solution.