jncraton / languagemodels

Explore large language models in 512MB of RAM
https://jncraton.github.io/languagemodels/
MIT License
1.18k stars 78 forks source link

Split chunks on semantic meaning #6

Closed jncraton closed 1 year ago

jncraton commented 1 year ago

This change adjusts the chunking algorithm to attempt to chunk on sentences rather than arbitrary token boundaries. This improves retrieval performance.