ipfs / go-ipfs-chunker

go-ipfs-chunkers provides Splitter implementations for data before being ingested to IPFS
MIT License
31 stars 37 forks source link

Splitting texts on line boundaries #10

Closed dbw9580 closed 1 year ago

dbw9580 commented 5 years ago

When working with some text-based data files, e.g. csv records and JSON streams, I think it would be more meaningful to break on line boundaries, as each of the resulting pieces will by itself a complete dataset and is easier to process without the hassle to otherwise concatenate two parts together. So I think a chunker that recognizes newlines and allows to specify a "min lines" and a "max lines" parameter (like the Rabin chunker) would be very helpful.

Stebalien commented 5 years ago

Sounds reasonable although I'd expect it to have max-bytes, min-bytes, and avg-bytes parameters.

hacdias commented 1 year ago

This repository is no longer maintained and has been copied over to Boxo. In an effort to avoid noise and crippling in the Boxo repo from the weight of issues of the past, we are closing most issues and PRs in this repo. Please feel free to open a new issue in Boxo (and reference this issue) if resolving this issue is still critical for unblocking or improving your usecase.

You can learn more in the FAQs for the Boxo repo copying/consolidation effort.