Open jwilk opened 16 years ago
Text extraction was indeed broken. I fixed it, but rotated text is still extracted incorrectly. That's probably because of a DjVuLibre bug.
pdftotext is dealing fine with rotated text, so reimplementing its algorithm (rather than relying on DjVuLibre) would solve the problem:
$ pdftotext rotated-lorem.pdf - | grep L
Lorem ipsum
Lorem ipsum
$ pdf2djvu -q rotated-lorem.pdf | djvutxt - | grep L
Lorem ipsum
Loremipsum
Attachment: rotated-lorem.pdf
Issue reported by
gaiason@yahoo.com
at Google Code:What steps will reproduce the problem?
What is the expected output? What do you see instead?
I expect and hope to see the text and and text coordinates of the rotated text to be captured and displayed correctly, however I see a big lump of text with the text coordinates set to the start and end of the block of rorated text.
What version of the product are you using? On what operating system?
PDF to DjVu GUI version 1.0 and 1.1 on Windows XP.
Please provide any additional information below.
http://www.djvu.org/forum/phpbb/viewtopic.php?p=1135&&sid=4fc56a4adfc23e656ba88a463e8e2750#1135
Cheers, Gaiason