Unicode characters appear as ????? in script-generated header/footer

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

Run the following PHP code (also attached):

<?php

$html = '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>

<body>

<script type="text/php">
if ( isset($pdf) ) {
  $font = Font_Metrics::get_font("DejaVu Sans Condensed");
  $color = array(0,0,0);
  $headerHeight = Font_Metrics::get_font_height($font, 11);
  $pdf->page_text(50, 50, "Персонализиране хедър", $font, 11, array(0, 0, 0));
}
</script>

<div style="font-family: DejaVu Sans Condensed, monospace; font-size: 11px;">
Персонализиране хедър
</div>
</body>
</html>
';

include_once('dompdf/dompdf_config.inc.php');
$dompdf = new DOMPDF();
$dompdf->load_html($html);
$dompdf->render();
$dompdf->stream('unicode.pdf', array('Attachment' => 1));

?>

What is the expected output? What do you see instead?

Header text should appear as Персонализиране хедър. 
However, it appears as ??????????????? ?????

What version of the product are you using? On what operating system?

I'm running the trunk version on Linux/Ubuntu, PHP 5.3.5.

Please provide any additional information below.

I have the latest trunk version of DOMPDF, configured it to work with Unicode 
and I can properly see Unicode characters in the HTML part of the document. 
However, there is no way to get Unicode characters in the header/footer (see 
attached code sample). I tried various combinations of utf8_encode, 
mb_convert_encoding in various places, no luck. So I concluded that this must 
be a defect in the DOMPDF/CPDF class and reported the defect as such.

Original issue reported on code.google.com by freecorv...@gmail.com on 6 Jul 2011 at 11:23

Attachments:

dompdf.html

GoogleCodeExporter commented 9 years ago

Hello, as you are using the SVN version, you can use the other methods to have 
headers and footers : 

<div id="header">My header</div>

with this style :

#header {
  position: fixed; 
  top: 50px; 
  left: 50px;
}

You can also add a page number in this element (or any other element, like in a 
footer div, with a "bottom" proprerty) : 

#footer:after {
  content: "Page " counter(page);
}

See the examples CSS/content and CSS/position_fixed here

   http://pxd.me/dompdf/www/examples.php

Original comment by fabien.menager on 7 Jul 2011 at 6:04

Changed state: Done

GoogleCodeExporter commented 9 years ago

This is great, worked perfectly! Thank you.

Original comment by freecorv...@gmail.com on 7 Jul 2011 at 5:31

GoogleCodeExporter commented 9 years ago

The reason you are unable to use characters that do not fall within iso-8859-1 
encoding is that the PHP evaluator class first parses any inline PHP with the 
utf8_decode() function. This function converts a string to iso-8859-1 and any 
non-translatable characters are converted to a question mark (?). See here: 
http://us2.php.net/manual/en/function.utf8-decode.php

You can work around this limitation by hex-encoding the strings in your script. 
Of course, that means you have to dynamically generate your inline script and 
convert any problem characters (or know how to write in hex, fileformat.info is 
your friend if you want to try).

As an example of converting a string in PHP, the following code converts all 
the characters of the $str variable into their hex representation:

  $str = 'Персонализиране';
  $hexstr = '';
  for ($i=0;$i<strlen($str);$i++) {
    $hexstr .= sprintf('\\x%lx', ord($str[$i]));
  }
  echo $str , '=' , $hexstr;

If you past the resulting hex string into your inline script you will get the 
expected output. See here:
http://eclecticgeek.com/dompdf/debug.php?identifier=a6592dbc0b2f300737cb39d9fede
f89b

---

I'm reopening this issue because I think we should make a determination as to 
whether or not parsing the inline script with utf8_decode() is necessary. 
Fabien, do you know of any reason for the parsing? I didn't see any problems 
after a cursory test (I even used a multi-byte character in a variable name). 
However, I was not thorough and I can imagine the possibility of problems if 
PHP is parsing multi-byte characters as part of the code (i.e. not as variable 
content). On the other hand, nobody should be processing PHP code that they 
themselves did not write, so the harm should be minimized.

Original comment by eclecticgeek on 14 Jul 2011 at 3:52

Changed state: Accepted
Added labels: Milestone-Release0.6

GoogleCodeExporter commented 9 years ago

Original comment by eclecticgeek on 9 Jan 2012 at 8:27

Changed title: Unicode characters appear as ????? in script-generated header/footer

GoogleCodeExporter commented 9 years ago

Since there have been no objections or further thoughts I modified the way 
inline scripts are handled so that they are no longer parsed by utf8_decode(). 
If anyone sees any problem because of this change please post a follow-up here.

(addressed in r491)

Original comment by eclecticgeek on 16 Apr 2012 at 10:21

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Confirming that the original problem is fixed in trunk rev #491. Thanks!

Original comment by freecorv...@gmail.com on 17 Apr 2012 at 7:46

GoogleCodeExporter commented 9 years ago

Issue 517 has been merged into this issue.

Original comment by eclecticgeek on 18 Jul 2012 at 2:25

GoogleCodeExporter commented 9 years ago

Original comment by eclecticgeek on 30 May 2013 at 5:16

Added labels: Restrict-AddIssueComment-Commit

vikrambalye / dompdf

Unicode characters appear as ????? in script-generated header/footer #320