mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
257 stars 26 forks source link

Is Video Supported? #7

Open ssmmoo1 opened 11 months ago

ssmmoo1 commented 11 months ago

Can this model accept video as an input?

gordonhu608 commented 11 months ago

Thank you for your interest in our work. Yes it also supported video input, see this line of code https://github.com/mlpc-ucsd/BLIVA/blob/b45425a7c87d01ecc075d86c9f2376689a1c80db/bliva/models/bliva_vicuna7b.py#L298-L306