john0isaac / rag-semantic-kernel-mongodb-vcore

A sample for implementing retrieval augmented generation using Azure Open AI to generate embeddings, Azure Cosmos DB for MongoDB vCore to perform vector search, and semantic kernel.
https://techcommunity.microsoft.com/t5/educator-developer-blog/build-rag-chat-application-using-azure-openai-and-cosmos-db-for/ba-p/4055852
MIT License
32 stars 52 forks source link

Check for Movies Dataset #1

Closed john0isaac closed 6 months ago

john0isaac commented 7 months ago

https://www.kaggle.com/code/johnaziz/cleaning-imdb-movies

john0isaac commented 6 months ago

It works but the current implementation makes it impossible to search a movie using its name only semantically related words to the description of imdb movies. Proposal 1: Chunk the embedded content to be Movie title: ... Movie Name: ....

Proposal 2: Display additional_metadata in response. [Might not work for all cases]

Re sample size only the 2000 sample will be reasonable as it might take 30 mins to generate embeddings and store the 7000 record.

During the live demo we can interrupt the embedding of the data from the notebook and just add a reasonable ~ 100 record.