jamescadd / ChatGPT-GameRec

A GPT-powered chatbot that suggests games you'll love
1 stars 0 forks source link

Adds langchain indexer for youtube transcripts backed by faiss #11

Closed jamescadd closed 6 months ago

jamescadd commented 7 months ago

New main.py argument 'update-index' replaces transcript retrieval and embedding arguments

LangChain indexer only generates embeddings for new or changed YouTube transcripts. This still uses the YouTubeDownloader to retrieve transcripts for all videos which is slow/unnecessary but eliminates the cost associated with embedding them.

Note the OpenAIEmbeddings constructor parameter that set chunk size to 25 is removed here and will get the default value (currently 1000). This was done for no reason other than to try the defaults.

Using the LangChain indexer required creating an empty FAISS store which (I think) requires an index to be provided. Chose the most simple option "Flat" from https://github.com/facebookresearch/faiss/wiki/The-index-factory which results in IndexFlatL2 using "exact search for L2" method https://github.com/facebookresearch/faiss/wiki/Faiss-indexes but others are available.