Closed ducciodiblasi closed 5 years ago
Hmm, are you sure, you've set the correct encoding? You may also have to pass some procEnv
setting (see README). And does this also apply to the main HTML content of your page?
AFAIK htmlentities()
can not convert all character to UTF8, only those that have a HTML entity. So I'm hesitant to add this. I actually think it should work without htmlentities()
if everything is set to UTF8.
I have printed out (it is a header.php not header.html, so i can do a file_put_contents()
) the $_REQUEST
variable, and it printed out without the è
character, that's why I thought the issue was in the parameter itself, that when is read from the header.php is already corrupted, and not a matter of charset (since the file_put_contents acts before the page rendering).
From my tests I have seen that the character is there until wkhtmltopdf is called, when it is called it makes in turn the calls to the header and footer, passing them the replace
parameters as POST (or GET?), and maybe there the character gets corrupted.
You are right, htmlentities
is not the best solution, since is not complete, and does not deal with all cases (for example I had to add a nl2br
after htmlentities
to deal with new lines), and I would be hesitant too. Maybe the best is to extend your class to feed own needs, as I did.
I don't know all the conversion functions that php give us, maybe there is something out there I have tried also an urlencode since they are HTML requests, with no luck, but maybe during the tests I made some mistakes, I was in a hurry...
Just be aware of the issue (which seems to be rather in wkhtmltopdf
than in your class)
regards
I'm a bit confused: Our library has nothing to do at all with REQUEST, GET or POST parameters. It just creates a lengthy shell command string that executes wkhtmltopdf
on the command line. As --replace
params are passed on the comand line it's important that the shell environment can work with UTF8.
Some (simple!) code example to reproduce the issue would be helpful.
I can understand the confusion, my config is a bit complicated.
My header is not a plain html with js file, it is a php script instead, that performs an xsl transformation to an xml file stored in the server filesystem.
When i call your library i pass the header-html
parameter as something like http://url/to/heder.php
, and when your library runs the shell command, wkhtmltopdf calls that url passing to it the replace
parameters in POST (or GET, I don't remember).
You're right, your library has nothing to do at all with REQUEST, GET or POST, that's why i said that the issue seems to be rather in wkhtmltopdf than in your class.
What do you mean for "it's important that the shell environment can work with UTF8"? what should be done to be sure of that?
Just for study I'll try to reproduce the issue in a shell command, if I suceed I'll tell you.
Regards
Here's the part from the README:
$pdf = new Pdf(array(
'commandOptions' => array(
'procEnv' => array(
// Check the output of 'locale -a' on your system to find supported languages
'LANG' => 'en_US.utf-8',
),
),
));
I'm also not really an expert on this, but when a program is started it will checks these environment variables. As I understand it, the LANG
variable tells the program what character encoding is used e.g. for shell arguments. You can use locale -a
on your system (hopefully Linux?) to get a list of all values you can set there. With env | grep LANG
you can find out, if you're currently using a UTF8 locale. For me (Germany) it's LANG=de_DE.UTF-8
.
It would be interesting what happens if you call wkhtmltopdf
with some UTF8 character on the command line, for example:
wkhtmltopdf \
--header-center 'test {x}' \
--replace '{x}' 'è' \
index.html
If this works then you should be able to pass the right LANG
in procEnv
as shown above.
Oh, and to take out some complexity, start with a simple script where you focus on the PDF part. Just pass the values directly instead of some nifty XSL transformation and whatever - otherwhise you'll go insane to find which part of your setup is really causing the issue.
I'm pretty sure, that with the right locale settings everything works.
Note to self: Add pdfbox to the test environment so we can extract texts from created PDF files and verify that UTF-8 works.
Ok, I couldn't make it work either. But this seems to be a bug with wkthmltopdf itself: https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2427
Here's a simple test:
wkhtmltopdf --header-left 'äö [x]' --replace 'x' 'üüü' index.html out.pdf
While the äö
work fine, the üüü
in the replace argument don't.
Hi, I've just resolved an issue with option
--replace
for your wrapper. when i passed some special character (exampleè
) to thereplace
parameter, i got it passed uncorrectly to the footer, so when I converted it to uppercase with css directivetext-transform: uppercase;
, the character disappeared. Sincewkhtmltopdf
is just a browser under the hood, when you pass parameters in thereplace
parameter you have to convert special UTF8 characters tohtml entities
if you want them to be passed correctly as POST when it makesheader
andfooter
calls. I have derived yourpdf
class and overridden the following function:hope this helps regards