tdclemens / pdf2htmlEX

Convert PDF to HTML without losing text or format.
http://coolwanglu.github.com/pdf2htmlEX/
Other
0 stars 0 forks source link

Enhance Word Positioning #4

Open tdclemens opened 11 years ago

tdclemens commented 11 years ago

Word positioning is slightly off when the text is justified.

Traditionally pdf2htmlEX uses spans to offset characters in order to preserve PDF positioning.

One option is to change this or provide a command line option to position words on each line absolutely. This is advantageous for word selection since sometimes traditional positioning spans are in the middle of a word.