pauln / tcpdi_parser

Parser for use with TCPDI, based on TCPDF_PARSER
GNU Lesser General Public License v3.0
27 stars 47 forks source link

Comments in trailer cause parser to fail #9

Closed audiomason closed 8 years ago

audiomason commented 8 years ago

This trailer section will cause parsing to fail with an Unable to find trailer error:

trailer <</Root 71 0 R/ID [<301161dc6cb4aa570159c409124baab9>]/Info 72 0 R/Size 73>> %comments-here startxref 143557 %%EOF

The regex pattern in line 408 can be changed to allow comment lines: old:

'/trailer[\s]<<(.)>>[\s][\r\n]+startxref[\s][\r\n]+/isU'

new:

'/trailer[\s]<<(.)>>[\s][\r\n]+(?:[%].[\r\n])startxref[\s][\r\n]+/isU'

I'm not 100% certain that comments are allowed in the trailer section, but I've encountered some files that have them. These files appear to have been created by iText.

pauln commented 8 years ago

Thanks for the detailed report! Most of the issues which crop up are due to similar situations - things which aren't allowed (or aren't explicitly allowed, anyway) in the spec, but which various pieces of software do anyway (and PDF readers just deal with).

Is there any chance of you providing a sample PDF which exhibits this issue, so I can add it to my collection of "broken" PDFs for regression testing? I'd rather not have to try concocting one myself...

pauln commented 8 years ago

@audiomason Don't worry about supplying a sample file - I managed to insert a comment (as per your example) into an existing PDF and replicate this issue with minimal effort. Your fix looks good - do you want to submit a pull request, so you get proper credit for the fix?

audiomason commented 8 years ago

@pauln I just submitted a pull request.

(Also realized that the asterisks disappeared from my original comment. They are in the pull request.)

pauln commented 8 years ago

Thanks - merged! And yes, I noticed the disappearing asterisks - they italicised part of the regex. I was able to see where they should go, and tested the regex with your change - hence saying that it looks good :-)