Open BartMassey opened 3 years ago
encode_utf16
is using the platform's native endian. This is made clear when a u32
is cast directly to a u16
without converting the endian. I do agree that it may be good to explicitly document this.
The decode functions also assume native endian UTF-16.
This makes sense as a default. If necessary, endian conversion can be done before decoding or after encoding by mapping the &[u16]
slice to the required endian.
encode_utf16
is using the platform's native endian. This is made clear when au32
is cast directly to au16
without converting the endian.
Thanks. My read was too quick.
I do agree that it may be good to explicitly document this.
I can submit a PR if folks like.
The decode functions also assume native endian UTF-16.
I am now thoroughly confused, as usual. I swear I saw something with endianness somewhere in std, but I can't find it now.
Anyhow, I can add the documentation about endianness and the lack of a BOM in the appropriate spots. LMK what you think of me getting a PR together.
Thanks!
The documentation does not specify the endianness of
str::encode_utf16()
andchar::encode_utf16()
: it looks from the source like they are big-endian (UTF-16BE), but I may be reading it wrong and they are little-endian (UTF-16LE) or native-endian.This may be a deliberate design decision: if so I think it should be reconsidered, as the encoding is useless for some purposes if you don't know its endianness.
It would also be nice to indicate whether
str::encode_utf16()
inserts a byte-order mark (BOM): pretty sure it does not from the source, which is fine.It is probably too late to rename these functions or to add equivalents of opposite endianness at this point, which is too bad. It's an odd API given that the corresponding decode functions have little-endian and big-endian variants.