Open CAD97 opened 1 year ago
Perhaps as an unresolved question: with these added, FromUtf16Error
's Display
impl is no longer always accurate; it says "invalid utf-16: lone surrogate found"
, but these functions introduce a new failure case: the &[u8]
was of odd length. Making FromUtf16Error
hold information about which kind of error occurred would require making it not a ZST anymore, which could degrade performance since currently (see below)Result<String, FromUtf16Error>
is (non-guaranteed-ly) null-pointer-optimized to be the same size as String
.
Alternately, they could return some new FromUtf16BytesError
type which can represent both errors, so that String::from_utf16
can still return the null-pointer-optimized Result<String, FromUtf16Error>
.
(Alternately, FromUtf16Error
's Display impl could be updated to say something like "invalid utf-16: lone surrogate found, or odd length byte string passed"
.)
To note, Result<String, enum { L, R }>
is still niched. The data pointer is null and the other 2×usize
are available to carry the Err
payload. The only performance hit would be constructing or inspecting the error payload.
But that said, I also think just rendering the error as invalid utf-16
would be sufficient. Adding a new variant to the existing enum is also fine, but I don't think making a new error type is particularly helpful.
An alternative would be to panic if given an odd-length slice, since that's trivial to precheck. But not a particularly good alternative.
Feature gate:
#![feature(str_from_utf16_endian)]
This is a tracking issue for versions of
String::from_utf16
which take&[u8]
and use a specific endianness.Public API
Steps / History
Unresolved Questions
from_utf16le
,from_utf16_le
,from_le_utf16
,from_le_utf16_bytes
, and other such combinations.with_capacity
+push
implementation used forfrom_utf16
whilecollect
doesn't reserve capacity? (#48994)FromUtf16Error
currently displays as"invalid utf-16: lone surrogate found"
which isn't correct for an error due to odd byte length.