Our UTF16 API is able to handle invalid UTF16 by pretending unpaired surrogates are U+FFFD REPLACEMENT CHARACTERs.
It would be nice to be able to do the same for the UTF8 API.
We could implement TextSource for [u8] and have a whole other set of BidiInfo copies for unvalidated UTF8.
We could also have .char_indices() bail on the first non-UTF8, and then use the same BidiInfo with a separate constructor that is documented to accept invalid UTF8 and truncate the returned levels based on that.
Our UTF16 API is able to handle invalid UTF16 by pretending unpaired surrogates are U+FFFD REPLACEMENT CHARACTERs.
It would be nice to be able to do the same for the UTF8 API.
We could implement
TextSource
for[u8]
and have a whole other set ofBidiInfo
copies for unvalidated UTF8.We could also have
.char_indices()
bail on the first non-UTF8, and then use the same BidiInfo with a separate constructor that is documented to accept invalid UTF8 and truncate the returned levels based on that.