sambitdash / PDFIO.jl

PDF Reader Library for Native Julia.
Other
127 stars 13 forks source link

Getting assertation error in show_text_layout! #81

Closed sebastianpech closed 4 years ago

sebastianpech commented 4 years ago

I have a pdf which fails at this https://github.com/sambitdash/PDFIO.jl/blob/eeff74cd01dd29839465bb070b63c929bacd5e16/src/PDPageElement.jl#L618 assertation.

fails.pdf

sebastianpech commented 4 years ago

This happens when extracting the pdfs text

sambitdash commented 4 years ago

Extremely small characters affect the formatting so they must be ignored. The assertion was introduced to keep a check on them. In this case a textrun as 0x200b which is zero width space character of unicode leading to assertion failure. The textruns have to be ignored rather than asserted on.

sambitdash commented 4 years ago

https://github.com/sambitdash/PDFIO.jl/commit/759200f14f637eb87e3bec5e49de0695009c7504 should fix this issue.

sebastianpech commented 4 years ago

Works! Thanks for your effort. I have more to come will open an issue later