stavsap / comfyui-ollama

Apache License 2.0
338 stars 30 forks source link

how to summary video with vlm models #48

Open whmc76 opened 4 days ago

stavsap commented 3 days ago

Sample few frames from the video and use ollama vision node I guess. the frames sampling might be tricky though. Maybe try to use some existing node to video editing the can break the video into images in certain fps.

whmc76 commented 3 days ago

I tried some small snippets but it looks like VLM is just reasoning about each frame and I only get what happened to the last frame of the image

whmc76 commented 3 days ago

https://pub.towardsai.net/youtube-video-summary-using-ollama-mistralai-96fdeed738e6 i found this and wondering how to apply in comfyui,I think you should be able to handle it haha