wyona / katie-backend

Katie Backend
https://katie.qa
Apache License 2.0
25 stars 1 forks source link

Improve Chunking #26

Open michaelwechner opened 2 months ago

michaelwechner commented 2 months ago

Compare

http://localhost:8044/swagger-ui/#/segmentation-controller/getCharacterTextSplitterUsingPOST

with

https://js.langchain.com/docs/modules/data_connection/document_transformers/ https://python.langchain.com/docs/modules/data_connection/document_transformers/semantic-chunker/

and

https://docs.llamaindex.ai/en/v0.10.19/api_reference/service_context/node_parser.html

Also see https://towardsdatascience.com/how-to-chunk-text-data-a-comparative-analysis-3858c4a0997a

michaelwechner commented 3 weeks ago

Also see "Sentence Window Retrieval"

https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo/

https://towardsdatascience.com/the-challenges-of-retrieving-and-evaluating-relevant-context-for-rag-e362f6eaed34

michaelwechner commented 2 weeks ago

Also see https://medium.datadriveninvestor.com/new-chunking-method-for-rag-systems-2eb3523d0420