tdclemens / pdf2htmlEX

Convert PDF to HTML without losing text or format.
http://coolwanglu.github.com/pdf2htmlEX/
Other
0 stars 0 forks source link

initial work on word positioning and pdf spacing problem #7

Closed tdclemens closed 10 years ago

tdclemens commented 10 years ago

This is a super rushed solution for grouping words into spans and solving the spacing problem.

To group characters on a line into words:

What it does well:

What it does bad:

I have a similar solution in mind that will most likely work better. I will right up an issue for it next.