lifthrasiir / rust-encoding

Character encoding support for Rust
MIT License
284 stars 59 forks source link

hz-gb-2312 encoding and WHATWG compatibility #84

Closed aneeshusa closed 9 years ago

aneeshusa commented 9 years ago

The WHATWG Encoding Spec lists hz-gb-2312 as mapping to the replacement encoding, which uses the UTF-8 encoder and throws a special replacement encoding error for its decoder. However, it looks like this crate implements the actual HZ encoding. For WHATWG compatibility, this would have to get folded in with the rest of the replacement encodings, but I don't know if that's acceptable considering other people may be using the current implementation.

Would you prefer to maintain strict WHATWG compatibility or keep the current implementation? If the current implementation is kept, this deviation needs to be well documented - it isn't too hard to work around, but is a bit annoying and could catch someone unaware because the rest of the crate is compatible.

lifthrasiir commented 9 years ago

Confirmed. This is another piece of change I've missed. It would be enough to fix the encoding_from_whatwg_label and whatwg_name.

Any deviation from the current WHATWG specification is unintentional and to be fixed. Please file an issue or PR dealing with such deviations. (I'm kind of lazy and not always aware of all changes to the specification, but I think at some point I have implemented all encodings in the specification correctly.)

aneeshusa commented 9 years ago

Haha, I went through the whole spec and this is the last one I've found with regard to names and labels. Do you just want to ...rip out the entire HZ implementation though? I think it'd be a shame to throw it away/not expose it some other way.

lifthrasiir commented 9 years ago

@aneeshusa I want to keep the encoding, simply making it invisible from encoding_from_whatwg_label.

aneeshusa commented 9 years ago

OK, that's reasonable. I can put in a PR for that in a few minutes (it should be only a line, I think.)