Closed GoogleCodeExporter closed 9 years ago
Problem solved.
Instead of
...
def _escape(self, s):
#Add \ before \, ( and )
return s.replace('\\','\\\\').replace(')','\\)').replace('(','\\(')
...
use
...
def _escape(self, s):
# Add \ before \, ( and )
return s.replace('\\', '\\\\').replace(')', '\\)').replace('(', '\\(').replace('\r', '\\r')
...
You may also compare to function _escape() in tfpdf
(http://fpdf.org/en/script/script92.php).
Would you be so kind to check this issue? Possibly this modification could be
taken into the sources.
Original comment by edwin.ce...@liebherr.com
on 22 Jan 2013 at 1:45
Are you sure that \x0d\x0a pair in source text lead to this issue?
Is anybody know some PDF reader for linux to test this issue? (except Adobe
Reader)
Original comment by romiq...@gmail.com
on 22 Jan 2013 at 2:00
1. Are you sure that \x0d\x0a pair in source text lead to this issue?
yes, really sure. You may create a pdf with only these symbols
"""
U+010A -- Ċ -- Latin Capital Letter C with dot above
U+010D -- č -- Latin Small Letter C with caron
U+010E -- Ď -- Latin Capital Letter D with caron
""".
Do it once with pyfpdf and once with tfpdf (under php). Compare the resultant
pdf docs with a hex-diff-tool (i took gvim on windows). I recommend to set
compression off.
Then you may compare the sources of tfpdf's function _escape() to pyfpdf's def
_escape().
Additionally you may search fpdf.py for "txt2 = self._escape(UTF8ToUTF16BE(txt,
False))" and log the text before and after UTF8ToUTF16BE and after
self._escape. You may do this once in pyfpdf and once in tfpdf.
I checked the new pdf (with \\r instead of \r)
on Windows with
- Adobe Acrobat Reader XI,
- FoxIt Reader 5.4,
- Google Chrome,
on Linux with
- evince,
- gimp import,
- xpdf
on HP-UX with
- Adobe Acrobat Reader 5.0,
and on Android with
- Polaris office
All readers show symbol U+010D as expected.
2. Is anybody know some PDF reader for linux to test this issue? (except
Adobe Reader)
I took xpdf and evince.
Original comment by edwin.ce...@liebherr.com
on 22 Jan 2013 at 3:25
Edwin, still not sure if escaping is right way.
But references (both 1.3 and 1.7) didn't specify how to properly integrate
unicode string literal into text object (ie mix 7-bit string with UTF16BE).
Ref specify only BOM mark for all text string, not for one text literal.
Ref 1.3 also complain about encrypted 8-bit literals, but who care.
Please test this patch. (small refactor and proposed change)
1. http://www.adobe.com/devnet/pdf/pdf_reference_archive.html
Original comment by romiq...@gmail.com
on 23 Jan 2013 at 7:22
Attachments:
Hello,
unfortunately I worked on this issue not by forward engineering but with
reverse engineering (just to get a solution in an acceptable amount of time).
Therefore I did not check the 700 pages 1.3 spec before.
I generated the document with python 2.7, pyfpdf 1.7 and your patch. Symbol
U+010D (and also all symbols from Latin-1 Supplement and Latin extended-A) are
shown as expected with this viewers:
on Windows 7 with
- Adobe Acrobat Reader XI,
- Adobe Acrobat Pro X,
- FoxIt Reader 5.4,
- Google Chrome,
on Linux with
- evince,
- gimp import,
- xpdf,
on HP-UX with
- Adobe Acrobat Reader 5.0,
Original comment by edwin.ce...@liebherr.com
on 23 Jan 2013 at 11:10
Edwin, 1.7 ref is about 1300, by the way :) I also didn't read full spec
carefully. yet.
I tested this patch with DejaVuSansCondensed, DroidSans an Ununtu-R fonts in
* evince (poppler library)
* atril (evince fork)
* Adobe Reader for android (This app also show U+010A instead of U+010D)
Look nice.
If there is no objection - i'll merge this patch.
Original comment by romiq...@gmail.com
on 23 Jan 2013 at 11:27
I appreciate it.
Thank you!
Original comment by edwin.ce...@liebherr.com
on 23 Jan 2013 at 12:25
Committed.
Edwin, thank you for reporting and proposed solution.
Original comment by romiq...@gmail.com
on 25 Jan 2013 at 5:30
Original issue reported on code.google.com by
edwin.ce...@liebherr.com
on 17 Jan 2013 at 3:10Attachments: