hwchase17 / notion-qa

MIT License
2.13k stars 376 forks source link

Ingesting your own dataset #2

Closed bouiboui closed 1 year ago

bouiboui commented 1 year ago

Hi, when ingesting my own dataset, I understand that I must remove/replace Export*.zip and Notion_DB. Should I also remove faiss_store.pkl? (I've never used faiss) Are there other things that need to be removed/reset first? Thanks!

jasielmacedo commented 1 year ago

According to my understand of this code and tests that I did, the Export*.zip seems to be a leftover because all the content lives in the Notion_DB folder. If you replace the Notion_DB folder putting your own data (keeping the folder name of course), just execute ingest.py providing the openAI key and faiss will auto update the faiss_store.pkl

bouiboui commented 1 year ago

But does it mean it will also have Blendle's employee handbook information?

jasielmacedo commented 1 year ago

Yes and no, because the vectors could be there (locally) but in order to use them, you have to call the Open AI Embedding API to generate those vectors on Open AI side as well. But I recommend delete faiss_store.pkl, the Faiss library will generate a new one anyway

bouiboui commented 1 year ago

Alright, thanks !