haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.59k stars 2.16k forks source link

[Question] Predictions in video by stitching consecutive frames into a single image #916

Open shashi-netra opened 9 months ago

shashi-netra commented 9 months ago

Question

I am looking to use LLaVa for predictions in video by stitching a sequence of consecutive frames into a single image and then asking LLava for a prediction. Has anyone used this approach before and found any success? if so, any tips on how you approached it.

mhkz commented 5 months ago

Hi, now I also need to predict videos. Do you have a better solution? My current approach is to draw frames to predict