Closed dermoench42 closed 6 years ago
Welcome to the world of Unicode.
What you are doing is actually the recommended way to deal with Unicode in perl. At every point where data enters or leaves your system, you need to deal with character encoding. This includes data going into and out of your database as well as data going in and out of your app via HTTP or STDIO for example. If your database containing templates is UTF-8 encoded, you must decode that data at every point where it enters your perl program. Similarly, at every exit point, if you wan to send UTF-8 encoded data you need to encode the data as UTF-8.
As for getting the template data out of the database, you may be able to have this handled automatically for you. For example, in PostgreSQL, if your database is in UTF-8 encoding, you can set pg_enable_utf8 => 1
in the connect params and DBD::Pg will handle the encode()
and decode()
for you at the database interface layer.
I know a few things on unicode. The templates doesn't reside in the Database. They are in the filesystem and loaded from there.
They are loaded with the code snippet and then printed out via stdout to the webserver.
How does Text::Template know which encoding to apply on load?
How can I tell Text::Template that it should load and process a template with utf-8 Encoding?
Ervin
Am Mon, 30 Apr 2018 06:08:24 -0700 schrieb Michael Schout notifications@github.com:
Welcome to the world of Unicode.
What you are doing is actually the recommended way to deal with Unicode in perl. At every point where data enters or leaves your system, you need to deal with character encoding. This includes data going into and out of your database as well as data going in and out of your app via HTTP or STDIO for example. If your database containing templates is UTF-8 encoded, you must decode that data at every point where it enters your perl program. Similarly, at every exit point, if you wan to send UTF-8 encoded data you need to encode the data as UTF-8.
As for getting the template data out of the database, you may be able to have this handled automatically for you. For example, in PostgreSQL, if your database is in UTF-8 encoding, you can set
pg_enable_utf8 => 1
in the connect params and DBD::Pg will handle theencode()
anddecode()
for you at the database interface layer.
-- dermoench42 dermoench42@googlemail.com
Ok, I misunderstood what you were saying originally.
Text::Template is not encoding aware currently. It would be nice if an encoding could be specified if the SOURCE is a filename.
But meanwhile one solution would be to open the template file yourself and apply the :encoding(UTF-8)
layer to the filehandle and pass the filehandle
e.g.:
open my $fh, '<', 'template.txt';
binmode $fh, ':encoding(UTF-8)';
my $t = Text::Template->new(TYPE => 'FILEHANDLE', SOURCE => $fh);
...
Ideally you should be able to just say:
Text::Template->new(SOURCE => 'whatever.txt', ENCODING => 'UTF-8')
I'll add that for the next version.
Can you tell me if this fixes the issue for you before I push this release to cpan?
Hello Michael,
I just cloned the repo and applied the module and encoding settings to my site. Result:
What I thought: on ommitting the ENCODING setting, what would be better:
applying the default setting from locale(*n.x), or
keeping the old behaviour, no en|decoding at all ?
Many thanks and kind regards,
Ervin
(Weimar, Germany)
Am Tue, 01 May 2018 17:06:21 +0000 (UTC) schrieb Michael Schout notifications@github.com:
Can you tell me if this fixes the issue for you before I push this release to cpan?
-- dermoench42 dermoench42@googlemail.com
Given the age of this module and its deployed base, I am EXTREMELY reluctant to change its existing behavior if ENCODING is not given at the risk of breaking existing code.
Therefore, keeping the old behavior (that is, you get bytes not characters) is how it will continue to behave if no ENCODING is given.
I'm using this module for nearly 15 years now in some webservices. An annoying thing is an character encoding issue. The module runs under an utf8 system, perl and connects to a utf8 database. But when I filled an utf8 encoded template with some data and deliver that via apache, the template parts got double encoded.
I fixed it by adding a
binmode F, 'utf8';
in the template load function, but that is somehow inconvenient.What is the correct way to handle that issue? Or is it somehow a bug or a potential improvement?
example output function:
I use gentoo stable 1.460 Version, saw the current load_text function got modernized.
Kind regards,
Ervin