Open clang88 opened 2 years ago
Noone else experiencing this issue? Unfortunatly this makes the XSL Transformation function almost unusuable, because you never know in advance what goes wrong.
I'm no seasoned developer and have no experience with C or C++, but if someone could point me to the XSLT code in the repo I can try and find what potentially is causing this issue.
In the past I haven't used the XSL feature much, but today I happened to need it and I ran head-on into the same issue @clang88 ! For my specific use case, the omission of the encoding in the XML header is not too bad... BUT the character encoding issues (in your example, "ü" displaying as "xFC") are show-stoppers for me.
What appears to be happening is, for some reason, even though the current document (the source for the transformation) has UTF-8 encoding, something (somewhere) gets converted into ANSI (Windows-1252) encoding. This is evidenced by your "ü" becoming xFC (which is its Windows-1252/ansi encoded value). In my use case, my XML contains other punctuation characters -- en-dashes, curly quotes, etc. -- and these all come through the XSL transformation showing up with their Windows-1252 single-byte representations also. My en-dashes show up as x96. Curly apostrophes show up as x92. All of these are the single-byte ANSI encodings for these characters. The output file CLAIMS to be UTF-8 ... but that's why we're seeing x96, x92, xFC, etc... because these bytes don't mean anything in UTF-8.
Any chance someone would be willing to look into this? I will see if I can put together a simple test case.
By the way... if anybody else runs into this... my "workaround" is to
The above only works for me, because my example XML has characters that are in Windows-1252/ANSI but are encoded differently in UTF-8. If my source file contained other Unicode characters that are outside of Windows-1252, I don't know what would happen -- the workaround would obviously not work though.
I'm using the latest Notepad ++ (8.4) with XML Tools 3.1.1.13.
My XSL starts like this:
`<?xml version="1.0" encoding="UTF-8"?>
What I would expect, is that my originally UTF-8 formatted XML is transformed, maintaining all special characters and
encoding="UTF-8"
being added to the declaration. What I get is this however:`<?xml version="1.0"?>