pbl-nl / appl-docchat

Chat with your own documents pressure cooker
https://github.com/pbl-nl
MIT License
17 stars 4 forks source link

Custom pd retriever #157

Closed StefanTroost closed 4 weeks ago

StefanTroost commented 1 month ago

The custom ParentDocumentRetriever first splits text into chunks and then furthermore splits the chunks into child_chunks. The child chunks and their embeddings are sored in the vector database in the normal way. However, their associated "parent chunks" and the parent chunk embeddings are stored as metadata to the child chunk documents. The ParentDocumentRetriever makes a distance comparison between the users' prompt on the child chunks, however retrieves the associated parent chunks to put into the context for the LLM