Closed techsy730 closed 3 years ago
I'm not entirely sure an algorithmic library of containers is the right place where to do this...
True. We could introduce a collector into StringBuilder
or String
instead, and then let them use the codepoint methods there.
It would introduce an extra copy, yes, but also frees us from the UCF-16 uglyness.
Although CharCollection (and friends) have methods like
intIterator
which convert the chars into ints for iteration (and ultimately, an intStream). However, this does not do codepoint conversion, it leaves surrogate pairs as separate elements. See https://docs.oracle.com/javase/tutorial/i18n/text/supplementaryChars.html and https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#supplementaryThis behavior should remain as by contract these methods must have a one to one relation between each char element and each int element it is casted to.
However, for CharCollection and friends only, it might be useful to introduce codepoint versions of these widening methods. That will combine such paris and return a such a sequence/stream/whatever of true code points. Like
codepointIterator
,codepointStream
, etc.However supplementary characters and surrogate pairs and stuff are notoriously mind bending to handle correctly, so we would want to introduce this with care.