Note This is not intended to be production-ready or not even poc-ready. This is just a fun experiment!
This repo contains a Python notebook to show how you can integrate MongoDB with LlamaIndex to use your own private data with tools like ChatGPT. Your data are fed into the LLM using a technique called "in-context learning". To do so we leverage the Mongo Loader available in LlamaHub. A big part of this exercise was to demonstrate how you can use locally running models like HuggingFace transformers and GPT4All, instead of sending your data to OpenAI. All the code can be executed completely on CPU.
The step are explained in the notebook but basically I leveraged the sample_mflix-movies
collection part of the sample dataset available in MongoDB Atlas. We index the documents in the collection and on top of them I created a fictitious document for a fictitious movie called "The Paolo Picello movie"
, describing the life of a Solutions Architect trying to build cool apps with AI and MongoDB.
I then provided the following question to the system:
What is the name of the movie that talks about a computer engineer trying to build a demo of how you can leverage AI tools to answer questions around data stored in MongoDB?"
and the system answer:
The name of the movie is "PaoLo Picello".
Interestingly, the system was able to get my name out of its corpus. This is not the exact name we specified in the MongoDB document ("The Paolo Picello movie") but it's still a quite impressive result.
Note The system allucinate quite a lot, giving most of the times pretty random results. But it's still fascinating to see the system able to get my name out as response.
This notebook is inspired by the LlamaIndex - Local Model Demo.ipynb notebook referred in the LlamaIndex documentation.
We welcome comments and contribution!!