While the Chunker library currently employs a Recursive Split strategy for text segmentation, suitable for maintaining semantic integrity with customizable overlap, there is a clear need for a less complex, basic splitting strategy. This foundational approach would offer a straightforward method to split text into chunks based purely on size, without any overlap or concern for semantic boundaries, thereby serving use cases that require simple, direct fragmentation of text.
Background
While the Chunker library currently employs a Recursive Split strategy for text segmentation, suitable for maintaining semantic integrity with customizable overlap, there is a clear need for a less complex, basic splitting strategy. This foundational approach would offer a straightforward method to split text into chunks based purely on size, without any overlap or concern for semantic boundaries, thereby serving use cases that require simple, direct fragmentation of text.
Acceptance Criteria
Scenario: Implementing Basic Text-Splitting Strategy
Given I am a developer looking to include basic text-splitting logic
BasicSplit
strategy into the Chunker librarysplit
function should offer a:strategy
option that acceptsBasicSplit
as a valueBasicSplit
is selected, the text should be divided into chunks strictly by the:chunk_size
without overlapChunks
, each with appropriatestart_byte
andend_byte
attributesRecursiveSplit
functionality remains unaffected[ ] And the library documentation should be updated to instruct users on choosing and using the
BasicSplit
strategy.created by jackson.oberkirch+demo@revelry.co using Prodops