Open pawelzwronek opened 1 day ago
This type of HTML file can only detect the keyword charset=ISO-8859-1
in its content to obtain the string ISO-8859-1
, which is then converted to a codepage.
This is because the encoding obtained through the uchardet component has a confidence level of confidence = 0.5f;
, making it uncertain what the encoding actually is.
This type of HTML file can only detect the keyword
charset=ISO-8859-1
in its content to obtain the stringISO-8859-1
, which is then converted to a codepage.
There is Notepad_plus::getHtmlXmlEncoding
already that do that. The question is if it's a expected and desired behaviour to implement.
This is because the encoding obtained through the uchardet component has a confidence level of
confidence = 0.5f;
, making it uncertain what the encoding actually is.
Actually uchardet
detection is only executed when opening/reloading a file, when encoding is set to UTF8 or is undefined. Moreover it's not executed when .html or .xml is opened and an encoding is detected with getHtmlXmlEncoding
.
Is there an existing issue for this?
Description of the Issue
charset=ISO-8859-1
encoding.Describe the solution you'd like.
I propose to autodetect an encoding from the file content when saving untitled buffer as .html, and switch from default UTF8 to detected encoding. Such autodetection is taking place in current version of N++ when opening .html file.
When proposed autodetection would work, encoding of saved file would switch to ISO-8859-1 and you will see correct encoded characters in the browser.
Debug Information
Anything else?
No response