π VLog: Video as a Long Document
Given a long video, we turn it into a doc containing visual + audio info. By sending this doc to ChatGPT, we can chat over the video!
News
- 23/April/2023: We release Huggingface gradio demo!
- 20/April/2023: We release our project on github and local gradio demo!
To Do List
Done
- [x] LLM Reasoner: ChatGPT (multilingual) + LangChain
- [x] Vision Captioner: BLIP2 + GRIT
- [x] ASR Translator: Whisper (multilingual)
- [x] Video Segmenter: KTS
- [x] Huggingface Space
Doing
- [ ] Optimize the codebase efficiency
- [ ] Improve Vision Models: MiniGPT-4 / LLaVA, Family of Segment-anything
- [ ] Improve ASR Translator for better alignment
- [ ] Introduce Temporal dependency
- [ ] Replace ChatGPT with own trained LLM
π§Έ Examples
[ News - GPT4 launch event ]
[ TV series - εΎζδΉεεΌΊδΉ°η ]
[ TV series - The Big Bang Theory ]
[ Travel video - Travel in Rome ]
[ Vlog - Basketball training ]
π¨ Preparation
Please find installation instructions in install.md.
π Start here
Run in cmd
python main.py --video_path examples/buy_watermelon.mp4 --openai_api_key xxxxx
The generated video document will be generated and saved in examples/buy_watermelon.log
Run in Gradio
python main_gradio.py --openai_api_key xxxxx
π Suggestion
Stay tuned for our project π₯
If you have more suggestions or functions need to be implemented in this codebase, feel free to drop us an email kevin.qh.lin@gmail.com
, leiwx52@gmail.com
or open an issue.
π Acknowledgment
This work is based on ChatGPT, BLIP2, GRIT, KTS, Whisper, LangChain, Image2Paragraph.