I am looking to use LLaVa for predictions in video by stitching a sequence of consecutive frames into a single image and then asking LLava for a prediction. Has anyone used this approach before and found any success? if so, any tips on how you approached it.
Question
I am looking to use LLaVa for predictions in video by stitching a sequence of consecutive frames into a single image and then asking LLava for a prediction. Has anyone used this approach before and found any success? if so, any tips on how you approached it.