Open nihen opened 11 years ago
The issue description is not quite accurate - the problem is malformed UTF-8, not double-encoding.
Commit f261fc21cb224569 codified the behaviour for handling variables without SvUTF8() on in templates with SvUTF8() on - the variable is assumed to be a sequence of UTF-8 octets, and converted to characters before interpolation.
However, neither the PP nor the XS code were made robust against the possibility that the variable to be interpolated was not a valid sequence in UTF-8. The PP code uses Encode to convert to characters, and Encode was substituting the replacement character (U+FFFD) in these cases, meaning that the rendered template would contain replacement characters (which is not great).
Worse, the XS code was performing no validation, meaning that such variables were being interpolated verbatim, resulting in malformed UTF-8 in the template.
Neither of these is good. This commit avoids generating malformed UTF-8 and replacement characters by interpolating the variable as-is (ie treating it as characters) if it is not a valid UTF-8 sequence. All existing tests pass, and the test supplied with the issue now also passes.
0001-Fix-for-issue-88-Latin-1-text-could-end-up-as-malfor.patch.txt
result