showlab / VLog

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
MIT License
538 stars 26 forks source link
chatgpt langchain large-language-model video-language whisper

🎞 VLog: Video as a Long Document

Open in Spaces

Tweet

Given a long video, we turn it into a doc containing visual + audio info. By sending this doc to ChatGPT, we can chat over the video!

vlog

News

To Do List

Done

Doing

🧸 Examples

[ News - GPT4 launch event ]GPT4 launch event
[ TV series - εΎζœδΉ‹εŽεΌΊδΉ°η“œ ]εŽεΌΊδΉ°η“œ
[ TV series - The Big Bang Theory ]The Big Bang Theory
[ Travel video - Travel in Rome ]Travel in Rome
[ Vlog - Basketball training ]Basketball training

πŸ”¨ Preparation

Please find installation instructions in install.md.

🌟 Start here

Run in cmd

python main.py --video_path examples/buy_watermelon.mp4 --openai_api_key xxxxx

The generated video document will be generated and saved in examples/buy_watermelon.log

Run in Gradio

python main_gradio.py --openai_api_key xxxxx

πŸ™‹ Suggestion

Stay tuned for our project πŸ”₯

If you have more suggestions or functions need to be implemented in this codebase, feel free to drop us an email kevin.qh.lin@gmail.com, leiwx52@gmail.com or open an issue.

😊 Acknowledgment

This work is based on ChatGPT, BLIP2, GRIT, KTS, Whisper, LangChain, Image2Paragraph.