Open nox opened 7 years ago
It would be reasonable to make this a breaking change while the number of direct dependants is acceptably low. (Presently, there are 22 publicly listed on crates.io). I feel that Boolean blindness would be unflattering in a stable API.
What should the new API look like? A few options:
Replace the boolean with an enum, so the API becomes graphemes(Extended)
or graphemes(Legacy)
.
Deprecate graphemes
, and introduce new methods extended_graphemes()
and legacy_graphemes()
.
Change graphemes()
to always return extended grapheme clusters, and add a new method legacy_graphemes()
.
I don't like (1) because the calls become very long, it requires importing more names, and like the current API it doesn't offer much guidance. I don't like (2) for most of the same reasons.
I like (3) because it makes the common case shorter, guiding people to the method that is most often correct. The downside is that it does not allow a deprecation period, but I don't think this is a big problem in this case.
- Change graphemes() to always return extended grapheme clusters, and add a new method legacy_graphemes().
This would be my choice as well. I don't personally mind importing enums, but extended grapheme clustering is clearly recommended[1] by the relevant Unicode annex and it seems sensible to offer it as an implicit default.
[1] Per the Unicode Standard Annex #29:
the extended grapheme cluster boundaries are recommended for general processing, while the legacy grapheme cluster boundaries are maintained primarily for backwards compatibility with earlier versions of this specification
+1 for 3/
(Note that, as of #23, this issue also applies to the newly added cursor API.)
An alternative option to introduce a more ergonomic API without breaking backwards compatibility is to use functions not grouped into a trait but accessible from the module directly. UnicodeSegmentation
trait can then be deprecated and removed eventually. The trait is implemented for str
so it won't be possible to call the methods directly on a string (i.e. s.graphemes()
vs graphemes(s)
) but that's pretty minor annoyance. A small additional benefit is that import will be less stuttering: use unicode_segmentation::graphemes
instead of use unicode_segmentation::UnicodeSegmentation
.
I'll be happy to submit a patch too. Or maybe @xfix is interested in updating #31. Would be good to move it forward.
I would also support option 3. It would be a breaking change but I guess that's fine. We can probably release a 1.3.1 which translates the new API into the old, so that we can move forward gradually.
Another +1 for option 3 here.
The boolean argument is quite useless when reading the code, because
foo.graphemes(true)
doesn't state what thattrue
thing is for.