xslate / p5-Text-Xslate

Scalable template engine for Perl5
https://metacpan.org/release/Text-Xslate
Other
121 stars 47 forks source link

latin-1 encoding sometimes mojibake #88

Open nihen opened 11 years ago

nihen commented 11 years ago
#!/usr/bin/env perl
use Test::More;

use utf8;
use Text::Xslate;
my $xslate = Text::Xslate->new();

is $xslate->render_string('<: $string :>', {string => "Ä"})      => 'Ä';
is $xslate->render_string('<: $string :>', {string => "\x{c4}"}) => 'Ä';

is $xslate->render_string('あ<: $string :>', {string => "Ä"})      => 'あÄ';
is $xslate->render_string('あ<: $string :>', {string => "\x{c4}"}) => 'あÄ';

done_testing;

result

ok 1
ok 2
ok 3
not ok 4
pepl commented 9 years ago

The issue description is not quite accurate - the problem is malformed UTF-8, not double-encoding.

Commit f261fc21cb224569 codified the behaviour for handling variables without SvUTF8() on in templates with SvUTF8() on - the variable is assumed to be a sequence of UTF-8 octets, and converted to characters before interpolation.

However, neither the PP nor the XS code were made robust against the possibility that the variable to be interpolated was not a valid sequence in UTF-8. The PP code uses Encode to convert to characters, and Encode was substituting the replacement character (U+FFFD) in these cases, meaning that the rendered template would contain replacement characters (which is not great).

Worse, the XS code was performing no validation, meaning that such variables were being interpolated verbatim, resulting in malformed UTF-8 in the template.

Neither of these is good. This commit avoids generating malformed UTF-8 and replacement characters by interpolating the variable as-is (ie treating it as characters) if it is not a valid UTF-8 sequence. All existing tests pass, and the test supplied with the issue now also passes.

0001-Fix-for-issue-88-Latin-1-text-could-end-up-as-malfor.patch.txt