Closed KirtashW17 closed 2 weeks ago
My bad. I didn't expected to get an ISO-8859-1. I understood that Nokogiri now return's strings in the encoding of the XML (or the one explicitly passed as third positional parameter), and it will replace all invalid characters for the given encoding.
Please describe the bug
With Nokogiri >= 1.16.0 I detected a strange behavior when handling ISO-8859-1 XML documents: content is handled as UTF-8, and invalid characters are replaced for valid bytes in UTF-8, so there is no way to obtain the original content.
Help us reproduce what you're seeing
I attach a simple ISO-8859 XML example (packed in a ZIP. GH doesn't allow xml files) test.xml.zip
Expected behavior
I expect to get the original content of the XML file. XML content should be interpreted as ISO-8859-1 and later converted to UTF-8, or event to get an UTF-8 string with invalid bytes that later I can interpret as ISO-8859-1
Environment