Open coezbek opened 3 years ago
Thanks for a great sample file that demonstrates the issue.
I am wondering is this something that pdf-reader is intended to do accurately?
I would classify it as a known issue that I'd like to handle better than we currently do. Probably the algorithm in PageLayout needs a significant overhaul, which is a bummer.
When reading text from a document that uses different font sizes on the same line of text, I have seen that fail both as extra spaces and overridden characters. I am wondering is this something that pdf-reader is intended to do accurately?
Example file: "hello_world_caps.pdf"
hello_world_caps.pdf
Example spec (fails):