wanglongqi / pdf2djvu

Automatically exported from code.google.com/p/pdf2djvu
0 stars 2 forks source link

Rotated text issue #6

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Get hold of a PDF page with rotated text or just simply rotate a 
document clockwise or counter-clockwise 
2. Convert the PDF page using PDF to DjVu GUI version 1.0 or 1.1
3. View the converted DjVu page result to see the problem.

What is the expected output? What do you see instead?
I expect and hope to see the text and and text coordinates of the rotated 
text to be captured and displayed correctly, however I see a big lump of 
text with the text coordinates set to the start and end of the block of 
rorated text. 

What version of the product are you using? On what operating system?
PDF to DjVu GUI version 1.0 and 1.1 on Windows XP.

Please provide any additional information below.
http://www.djvu.org/forum/phpbb/viewtopic.php?
p=1135&sid=4fc56a4adfc23e656ba88a463e8e2750#1135

Cheers,
Gaiason

Original issue reported on code.google.com by gaia...@yahoo.com on 22 May 2008 at 4:21

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Text extraction was indeed broken. I fixed it, but rotated text is still 
extracted
incorrectly. That's _probably_ because of a DjVuLibre bug.

Original comment by pro...@gmail.com on 22 May 2008 at 1:02

GoogleCodeExporter commented 9 years ago
See <http://sf.net/tracker/?func=detail&aid=1969580&group_id=32953&atid=406583>.

Original comment by pro...@gmail.com on 25 May 2008 at 9:06

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
pdftotext is dealing fine with rotated text, so reimplementing its algorithm 
(rather
than relying on DjVuLibre) would solve the problem:

$ pdftotext rotated-lorem.pdf - | grep L
Lorem ipsum
Lorem ipsum

$ pdf2djvu -q rotated-lorem.pdf | djvutxt - | grep L
Lorem ipsum 
Loremipsum 

Original comment by uba...@users.sf.net on 20 Apr 2009 at 7:12

GoogleCodeExporter commented 9 years ago

Original comment by uba...@users.sf.net on 20 Apr 2009 at 7:12

Attachments: