This PR adds a WithHeadingHierarchy option to the markdown textsplitter to prepend the heading hierarchy to chunks in Markdown documents instead of only current heading.
This improves retrieval of relevant chunks by a higher-level heading.
Example
Input:
# h1
foobar
## h2
bazbom
#### h4
spam eggs
... will yield the last chunk as
# h1
## h2
#### h4
spam eggs
... so that a semantic search by either h1, h2 or h4 will likely return this chunk.
[x] Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as memory: add interfaces for X, Y or util: add whizzbang helpers).
[x] Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
[x] Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. Fixes #123).
[ ] Describes the source of new concepts.
[ ] References existing implementations as appropriate.
This PR adds a WithHeadingHierarchy option to the markdown textsplitter to prepend the heading hierarchy to chunks in Markdown documents instead of only current heading. This improves retrieval of relevant chunks by a higher-level heading.
Example
Input:
... will yield the last chunk as
... so that a semantic search by either h1, h2 or h4 will likely return this chunk.
PR Checklist
memory: add interfaces for X, Y
orutil: add whizzbang helpers
).Fixes #123
).golangci-lint
checks.