servo / unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Other
74 stars 34 forks source link

We should have APIs that accept potentially-invalid UTF8 #135

Open Manishearth opened 2 months ago

Manishearth commented 2 months ago

Our UTF16 API is able to handle invalid UTF16 by pretending unpaired surrogates are U+FFFD REPLACEMENT CHARACTERs.

It would be nice to be able to do the same for the UTF8 API.

We could implement TextSource for [u8] and have a whole other set of BidiInfo copies for unvalidated UTF8.

We could also have .char_indices() bail on the first non-UTF8, and then use the same BidiInfo with a separate constructor that is documented to accept invalid UTF8 and truncate the returned levels based on that.

Manishearth commented 2 months ago

cc @robertbastian

I'm inclined to do the "bail on first non-UTF8" thing for now since it's a smaller change.

If we ever 2.0, we should make this code generic over encodings