Currently the only way to specify the charset is in the document (with BOM or <meta charset=); if the charset is known but not specified in the document, there is no way to specify it.
Additionally, charset detection even with Heuristics.ALL does not always work well; in particular, it fails to recognize UTF-8 at least if the first non-ASCII byte is late in the document. The WHATWG spec recommends that systems are able to recognize UTF-8 even if they arenʼt good at other charsets (as a non-normative note)
The UTF-8 encoding has a highly detectable bit pattern. Files from the local file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8, while documents with byte sequences that do not match it are very likely not. When a user agent can examine the whole file, rather than just the preamble, detecting for UTF-8 specifically can be especially effective. [PPUTF8][UTF8DET]
(This is reproduced with multiple test documents; the smallest is below but another one output the warning method that the UTF-8 character was invalid in Windows-1252, meaning that went with the default which was a particularly bad guess)
Currently the only way to specify the charset is in the document (with BOM or
<meta charset=
); if the charset is known but not specified in the document, there is no way to specify it.Additionally, charset detection even with
Heuristics.ALL
does not always work well; in particular, it fails to recognize UTF-8 at least if the first non-ASCII byte is late in the document. The WHATWG spec recommends that systems are able to recognize UTF-8 even if they arenʼt good at other charsets (as a non-normative note)(This is reproduced with multiple test documents; the smallest is below but another one output the warning method that the UTF-8 character was invalid in Windows-1252, meaning that went with the default which was a particularly bad guess)