mschout / perl-text-template

Expand template text with embedded Perl
13 stars 6 forks source link

How to handle template character encoding? #11

Closed dermoench42 closed 6 years ago

dermoench42 commented 6 years ago

I'm using this module for nearly 15 years now in some webservices. An annoying thing is an character encoding issue. The module runs under an utf8 system, perl and connects to a utf8 database. But when I filled an utf8 encoded template with some data and deliver that via apache, the template parts got double encoded.

I fixed it by adding a binmode F, 'utf8'; in the template load function, but that is somehow inconvenient.

What is the correct way to handle that issue? Or is it somehow a bug or a potential improvement?

example output function:

use Text::Template;
// ...
sub out($$$) {
    my ( $self, $template, $vars ) = @_;
    my $tmpl =
      Text::Template->new( SOURCE => "$self->{templatepath}$template" );
    return ( $tmpl->fill_in( HASH => $vars ) );
}

I use gentoo stable 1.460 Version, saw the current load_text function got modernized.

Kind regards,

Ervin

mschout commented 6 years ago

Welcome to the world of Unicode.

What you are doing is actually the recommended way to deal with Unicode in perl. At every point where data enters or leaves your system, you need to deal with character encoding. This includes data going into and out of your database as well as data going in and out of your app via HTTP or STDIO for example. If your database containing templates is UTF-8 encoded, you must decode that data at every point where it enters your perl program. Similarly, at every exit point, if you wan to send UTF-8 encoded data you need to encode the data as UTF-8.

As for getting the template data out of the database, you may be able to have this handled automatically for you. For example, in PostgreSQL, if your database is in UTF-8 encoding, you can set pg_enable_utf8 => 1 in the connect params and DBD::Pg will handle the encode() and decode() for you at the database interface layer.

dermoench42 commented 6 years ago

I know a few things on unicode. The templates doesn't reside in the Database. They are in the filesystem and loaded from there.

They are loaded with the code snippet and then printed out via stdout to the webserver.

How does Text::Template know which encoding to apply on load?

How can I tell Text::Template that it should load and process a template with utf-8 Encoding?

Ervin

Am Mon, 30 Apr 2018 06:08:24 -0700 schrieb Michael Schout notifications@github.com:

Welcome to the world of Unicode.

What you are doing is actually the recommended way to deal with Unicode in perl. At every point where data enters or leaves your system, you need to deal with character encoding. This includes data going into and out of your database as well as data going in and out of your app via HTTP or STDIO for example. If your database containing templates is UTF-8 encoded, you must decode that data at every point where it enters your perl program. Similarly, at every exit point, if you wan to send UTF-8 encoded data you need to encode the data as UTF-8.

As for getting the template data out of the database, you may be able to have this handled automatically for you. For example, in PostgreSQL, if your database is in UTF-8 encoding, you can set pg_enable_utf8 => 1 in the connect params and DBD::Pg will handle the encode() and decode() for you at the database interface layer.

-- dermoench42 dermoench42@googlemail.com

mschout commented 6 years ago

Ok, I misunderstood what you were saying originally.

Text::Template is not encoding aware currently. It would be nice if an encoding could be specified if the SOURCE is a filename.

But meanwhile one solution would be to open the template file yourself and apply the :encoding(UTF-8) layer to the filehandle and pass the filehandle

mschout commented 6 years ago

e.g.:

open my $fh, '<', 'template.txt';
binmode $fh, ':encoding(UTF-8)';
my $t = Text::Template->new(TYPE => 'FILEHANDLE', SOURCE => $fh);
...

Ideally you should be able to just say:

Text::Template->new(SOURCE => 'whatever.txt', ENCODING => 'UTF-8')

I'll add that for the next version.

mschout commented 6 years ago

Can you tell me if this fixes the issue for you before I push this release to cpan?

dermoench42 commented 6 years ago

Hello Michael,

I just cloned the repo and applied the module and encoding settings to my site. Result:

What I thought: on ommitting the ENCODING setting, what would be better:

Many thanks and kind regards,

Ervin

(Weimar, Germany)

Am Tue, 01 May 2018 17:06:21 +0000 (UTC) schrieb Michael Schout notifications@github.com:

Can you tell me if this fixes the issue for you before I push this release to cpan?

-- dermoench42 dermoench42@googlemail.com

mschout commented 6 years ago

Given the age of this module and its deployed base, I am EXTREMELY reluctant to change its existing behavior if ENCODING is not given at the risk of breaking existing code.

Therefore, keeping the old behavior (that is, you get bytes not characters) is how it will continue to behave if no ENCODING is given.