Closed TingTingin closed 1 year ago
Thanks for your interest in our work. It takes around 20G to run vicuna (7B) version with fp16 and around 28G for flant5xxl (11B) version. So vicuna version can be fitted into a consumer gpu (24G).
Another way is try using int8, which can save lots of memory without performance drop.
Will bliva support the transformers library in the future?
Yes, we will support Huggingface model hub soon.
is it possible to run this with 8gb vram if so how?
We are still investigating int8 inference. Our initial attempt seems not working well. Try deepspeed with cpu could be a good solution.
How much vram does it take to run this on both the vicuña and flan versions can it be ran on consumer gpus? Also will the trams formers library be supported in the future?