penghao-wu / vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
https://vstar-seal.github.io/
MIT License
497 stars 32 forks source link

How much memory do I need to run the demo? #2

Closed kexul closed 8 months ago

kexul commented 8 months ago

Hi, I'd like to run the demo on a 4090 with 24G VRAM, is that enough?

penghao-wu commented 8 months ago

Hi, as we have two VLMs, the default setting will require around 28G memory. If you do need to run it on a single 4090, you might try to load and run the vqa model and visual search model separately.

kexul commented 8 months ago

Thanks! I enabled the load_8bit here: https://github.com/penghao-wu/vstar/blob/d10c0537b754ee14e33744c55649301e854aebd4/LLaVA/llava/model/builder.py#L26 Now it runs fine in my 24G memory. Thanks for your awesome work and open source!