Implement a more sophisticated layer separation algorithm

wanglongqi / pdf2djvu

Automatically exported from code.google.com/p/pdf2djvu

0 stars 2 forks source link

Implement a more sophisticated layer separation algorithm #7

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

1. Get a PDF file that contains a high-contrast scan of an old book, like
the "facsimiles" offered by some libraries.

2. Convert to DJVU. No other options than -o.

3. Resulting file looks anti-aliased.

What is the expected output? What do you see instead?

The problem here is, that anti-aliasing blurs the letters. Since the
letters where not 100% quality anyway, you end up with a much less readable
text than in the original PDF.

What version of the product are you using? On what operating system?

pdf2djvu --version

pdf2djvu 0.4.11 (DjVuLibre 3.5.20, poppler 0.8.3)

Am I missing anything?

Original issue reported on code.google.com by chriskar...@googlemail.com on 30 Jun 2008 at 5:14

GoogleCodeExporter commented 9 years ago

Could you provide an example PDF file?

Original comment by pro...@gmail.com on 4 Jul 2008 at 2:11

GoogleCodeExporter commented 9 years ago

Please send me your e-mail address at chris at ... (you can see the rest from 
above).
I will then send you a link to an example PDF.

Original comment by chriskar...@googlemail.com on 4 Jul 2008 at 8:40

GoogleCodeExporter commented 9 years ago

To sum up a private discussion with the bug reporter:
- Layer separation algorithm is far from being optimal. This often leads to 
wavelet
encoding of high-contrast image components (e.g. text), which is completely
inappropriate: resulting image is blurry, compression ratio is insufficient.
- Foolishly compressed PDF (e.g. black&white images stored as JPEGs) are not
uncommon. pdf2djvu could be fixed in order to make it a handy tool to properly
recompress such documents.

Original comment by pro...@gmail.com on 9 Jul 2008 at 1:03

Changed title: When PDF contains a scan, the DjVu file looks anti-aliased and is difficult to read
Changed state: Accepted
Added labels: Type-Enhancement, Component-Logic
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Issue 9 has been merged into this issue.

Original comment by uba...@users.sf.net on 3 Apr 2009 at 12:15

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Issue 56 has been merged into this issue.

Original comment by jwilk@jwilk.net on 21 Feb 2011 at 9:57

GoogleCodeExporter commented 9 years ago

Just wanted to point out that didjvu and img2djvu 
(http://code.google.com/p/didjvu/ and https://github.com/ashipunov/img2djvu), 
both of which appeared in the past couple of years, claim to have more 
sophisticated layer separation abilities. (For img2djvu, the brunt of the work 
is actually performed by another piece of software, "Scan Tailor," 
http://scantailor.sourceforge.net/ .) I haven't yet tested either of them but 
it may be worth checking to see whether ideas and/or code may be reusable.

Original comment by dmjens...@gmail.com on 18 Oct 2011 at 8:53

GoogleCodeExporter commented 9 years ago

Original comment by jwilk@jwilk.net on 13 Apr 2014 at 5:09

Added labels: Priority-Low
Removed labels: Priority-Medium