nickthecook / archyve

GNU Affero General Public License v3.0
117 stars 15 forks source link

Text chunking using recursive text splitting for plain text and markdown #40

Closed oxaroky02 closed 4 months ago

oxaroky02 commented 4 months ago
oxaronick commented 4 months ago

These chunks are readable!

oxaronick commented 4 months ago

Before I dive in, is it expected that PDF files would still use simple word splitting?

oxaroky02 commented 4 months ago

Before I dive in, is it expected that PDF files would still use simple word splitting?

For this PR, yes. The PDF conversion yields text with ... oddness. I haven't really tested that with the new splitter, and I'd like more time for that.