i-dot-ai / redbox

Bringing Generative AI to the way the Civil Service works
https://i-dot-ai.github.io/redbox/
MIT License
83 stars 27 forks source link

Added new chunk resolution to allow using summarisation level chunks #793

Closed jamesrichards4 closed 1 month ago

jamesrichards4 commented 1 month ago

Context

Chat over documents requires retrieving the full document text. We get better results doing this from larger chunks with simpler chunking.

Changes proposed in this pull request

Create an additional set of chunks for the purposes of summarisation/chat over docs chains. To support this change:

Tests have been moved over to being document based (rather than chunks). Not all of the backwards compatibility code has been removed however due to the retention of some chunk API functions. These should be removed in a future refactor

Moved ingest chain definitions to redbox-core to match the move over in core-api

Guidance to review

Things to check