The PR adds a new iterator: UnicodeWordIndices (and the function unicode_word_indices). It is similar to UnicodeWords but also provides byte offsets for each word.
The motivation for this PR was making https://github.com/jonathandturner/reedline/pull/5 in which I used split_word_bound_indices and then filtered the result using logic that is internal to unicode_words. I believe that PR would have been trivial using unicode_word_indices. Hopefully it can also be useful to others.
Should I add more tests for unicode_word_indices? Or are the existing tests for unicode_words and the doc test for unicode_word_indices sufficient?
The PR adds a new iterator:
UnicodeWordIndices
(and the functionunicode_word_indices
). It is similar toUnicodeWords
but also provides byte offsets for each word.The motivation for this PR was making https://github.com/jonathandturner/reedline/pull/5 in which I used
split_word_bound_indices
and then filtered the result using logic that is internal tounicode_words
. I believe that PR would have been trivial usingunicode_word_indices
. Hopefully it can also be useful to others.Should I add more tests for
unicode_word_indices
? Or are the existing tests forunicode_words
and the doc test forunicode_word_indices
sufficient?