Open lifthrasiir opened 11 years ago
I think that BOMAwareUTF8Encoding
the wrong approach. Rather, what’s needed is what the spec calls decode.
It could be be a BOMDecoder
(or other name) that takes a "fallback encoding" parameter. When the input starts with a BOM, the BOM is stripped and the corresponding encoding is used. Otherwise, the fallback encoding is used.
This decoder should always be used for formats that support multiple encoding, because the BOM (by proximity) is more accurate than other metadata.
@SimonSapin I have updated the description. I agree that this use case should be handled elsehow, see #19 for a separate discussion. BOM-aware encoding itself might be useful by itself though.
This issue was spotted during the removal of
TextEncoder
andTextDecoder
(#4).TextDecoder
has an ability to automatically strip the BOM (U+FFFD) from the input string if any.We need to emulate this in a separate encoding, perhapsThis use case itself can be handled better with decoders with a fallback encoding (#19), but we may need to require BOM-attached Unicode encodings from time to time: many applications of UTF-16 require BOM, for example.BOMAwareUTF8Encoding
(whichwhatwg_name()
is stillutf-8
)?