hsivonen / encoding_rs

A Gecko-oriented implementation of the Encoding Standard in Rust
https://docs.rs/encoding_rs/
Other
386 stars 55 forks source link

Encoding::decode_to_utf16 ? #24

Open SimonSapin opened 7 years ago

SimonSapin commented 7 years ago

I’ve just written this function:

fn decode_to_utf16(bytes: &[u8], encoding: &'static Encoding) -> Vec<u16> {
    let mut decoder = encoding.new_decoder();
    let capacity = decoder.max_utf16_buffer_length(bytes.len()).exepct("Overflow");
    let mut utf16 = Vec::with_capacity(capacity);
    let uninitialized = unsafe {
        slice::from_raw_parts_mut(utf16.as_ptr(), capacity)
    };
    let last = true;
    let (_, read, written, _) = decoder.decode_to_utf16(bytes, uninitialized, last);
    assert!(read == bytes.len());
    unsafe {
        utf16.set_len(written)
    }
    utf16
}

Do you think it would belong as a method of Encoding?

hsivonen commented 7 years ago

Do you think it would belong as a method of Encoding?

It doesn't exist as a method on Encoding at present, because I thought Rust programs would want to decode to UTF-8 and encode from UTF-16.

If there's a reason to believe that wishing to decode to UTF-16 in the non-streaming manner (with infallible allocation) has utility for Rust programs beyond one isolated case, then it would make sense to add UTF-16 variants of the non-streaming API to Rust, too. (Currently those variants are in C++ only.)

What's the context of your function? That is, should we expect it to represent a recurring use case or a one-time oddity?

SimonSapin commented 7 years ago

I’ve used this in the Servo implementation of https://xhr.spec.whatwg.org/#json-response which takes a Vec<u8> that was earlier read from the network, and calls a SpiderMonkey function that takes const char16_t* chars, uint32_t len. So it is a rather isolated case.

(By the way we’re switching Servo to encoding_rs: https://github.com/servo/servo/pull/19073)