showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Apache License 2.0
84 stars 14 forks source link

Load in 4 bit? #4

Closed johnwick123f closed 2 days ago

johnwick123f commented 1 week ago

Is there a way to load this in 4 bit? That would help a lot for users with low vram! Btw, great project!

chenjoya commented 1 week ago

Thank you for the attention! I will take a look at bitsandbytes recently. Will update on Wednesday.

chenjoya commented 5 days ago

Sorry, there are some bugs when I use bitandbytes quantization_config:

ValueError: weight is on the meta device, we need a `value` to put in on 0.

which may be due to the extra connector layer:

self.connector = torch.nn.Sequential(
    torch.nn.Linear(config.vision_hidden_size, config.hidden_size, bias=True),
    GELUActivation(config.hidden_size),
    torch.nn.Linear(config.hidden_size, config.hidden_size, bias=True),
)

The nn.Sequential will make it cannot retrieve the weight (I guess?)... I still recommend you to use GPU with higher memory...

johnwick123f commented 2 days ago

@chenjoya oh ok, thanks anyway. I'll try it to use it with a higher memory gpu!