unicode-rs / unicode-segmentation

Grapheme Cluster and Word boundaries according to UAX#29 rules
https://unicode-rs.github.io/unicode-segmentation
Other
565 stars 57 forks source link

Adding missing size_hint for UnicodeSentences, UnicodeWords, and UnicodeWordIndices #128

Closed ryanavella closed 8 months ago

ryanavella commented 8 months ago

I don't expect this to noticeably impact performance either positively or negatively for most use-cases, especially because Iterator::collect relies on the lower bound of size_hint which will remain unchanged after this PR.

However the upper bound will go from None to Some(upper), which may benefit downstream crates that use it as a heuristic for pre-allocation size.

Note also that I forwarded the implementation of size_hint to the inner iterator, which means it is UAX#29 agnostic. I'm not enough of a Unicode expert to know if e.g. word boundaries can be empty, so it may not be the tightest possible upper bound for longer strings.