Open GoogleCodeExporter opened 9 years ago
[deleted comment]
Text extraction was indeed broken. I fixed it, but rotated text is still
extracted
incorrectly. That's _probably_ because of a DjVuLibre bug.
Original comment by pro...@gmail.com
on 22 May 2008 at 1:02
See <http://sf.net/tracker/?func=detail&aid=1969580&group_id=32953&atid=406583>.
Original comment by pro...@gmail.com
on 25 May 2008 at 9:06
[deleted comment]
[deleted comment]
pdftotext is dealing fine with rotated text, so reimplementing its algorithm
(rather
than relying on DjVuLibre) would solve the problem:
$ pdftotext rotated-lorem.pdf - | grep L
Lorem ipsum
Lorem ipsum
$ pdf2djvu -q rotated-lorem.pdf | djvutxt - | grep L
Lorem ipsum
Loremipsum
Original comment by uba...@users.sf.net
on 20 Apr 2009 at 7:12
Original issue reported on code.google.com by
gaia...@yahoo.com
on 22 May 2008 at 4:21