This PR adds a pretty basic markdown text splitter, that only considers headings for splitting.
One can choose up to which level of headings to split the text and result chunks which are larger than the defined chunkSize will be chunked further by a secondary splitter (default is a recursiveCharacterSplitter).
Every chunk will be prefixed with the whole markdown heading hierarchy, improving the semantic search results.
Optionally, chunks that consist of headings only (i.e. no content) can be ignored/dropped.
The code follows the functional options pattern like the golc and langchaingo libs for consistency.
This PR adds a pretty basic markdown text splitter, that only considers headings for splitting. One can choose up to which level of headings to split the text and result chunks which are larger than the defined chunkSize will be chunked further by a secondary splitter (default is a recursiveCharacterSplitter). Every chunk will be prefixed with the whole markdown heading hierarchy, improving the semantic search results. Optionally, chunks that consist of headings only (i.e. no content) can be ignored/dropped.
The code follows the functional options pattern like the golc and langchaingo libs for consistency.