YouTube is rich in text data hidden in the subtitles included in each video. You can imagine that having access to this text data is equivalent in a way to having access to the knowledge of the YouTuber themself. With that in mind, conversational agents mimicking the YouTuber or retrieval-based chat bots using spoken facts from videos can be designed. In this project we choose the latter of these two.
Using subtitles collected from ~200 James Hoffman videos we design a chatbot capable of answering general coffee-related questions using explicitly the knowledge contained in his videos. A first querying step is done through sentence embeddings with cosine similarity ranking on the corpus of subtitles, then a secondary reranking is performed with a cross-encoder. Finally we make use of Flan-T5 to extract the answer from the most relevant document, or acknowledge that the question is unanswerable.
A user interface is also provided through the Slack API which allows users to interact with the bot over private messages.
!pip install -r requirements.txt
I'm kinda new to the frontend stuff so I made heavy use of this playlist on youtube for Flask + Slack API integration. Also for the chromadb part refer to their docs.
That being said, the pipline itself fully doesn't require Slack integration, I just figured it would be nice to provide an interface to everything. So if you want to get a feel for the model/workflow check out the t5skeleton.ipynb
file which is essentially the rough draft of this project entirely offline.
If you want to regenerate any of the data included in the clean
folder you'll need a functioning YouTube API key and to dig into the youtube-api.ipynb
notebook. Same goes for the embeddings but this time you'll need to use the t5skeleton.ipynb
. Otherwise I've included the cleaned subtitle .vtt's as well as the embeddings I generated for the demo here in the repo already.
Hopefully didn't leave any of my API keys around, that would be bad.