Implement a more sophisticated layer separation algorithm

jwilk-archive / pdf2djvu

PDF to DjVu converter

GNU General Public License v2.0

94 stars 17 forks source link

Implement a more sophisticated layer separation algorithm #7

Open jwilk opened 16 years ago

jwilk commented 16 years ago

Issue reported by chriskarakas at Google Code:

What steps will reproduce the problem?

Get a PDF file that contains a high-contrast scan of an old book, like the "facsimiles" offered by some libraries.
Convert to DJVU. No other options than -o.
Resulting file looks anti-aliased.

What is the expected output? What do you see instead?

The problem here is, that anti-aliasing blurs the letters. Since the letters where not 100% quality anyway, you end up with a much less readable text than in the original PDF.

What version of the product are you using? On what operating system?

pdf2djvu 0.4.11 (DjVuLibre 3.5.20, poppler 0.8.3)

Am I missing anything?

jwilk commented 16 years ago

Could you provide an example PDF file?

jwilk commented 16 years ago

Comment submitted by chriskarakas at Google Code:

Please send me your e-mail address at chris at ... (you can see the rest from above). I will then send you a link to an example PDF.

jwilk commented 16 years ago

To sum up a private discussion with the bug reporter:

Layer separation algorithm is far from being optimal. This often leads to wavelet encoding of high-contrast image components (e.g. text), which is completely inappropriate: resulting image is blurry, compression ratio is insufficient.
Foolishly compressed PDF (e.g. black&white images stored as JPEGs) are not uncommon. pdf2djvu could be fixed in order to make it a handy tool to properly recompress such documents.

jwilk commented 15 years ago

Issue #9 has been merged into this issue.

jwilk commented 13 years ago

Issue #56 has been merged into this issue.

jwilk commented 13 years ago

Comment submitted by dmjensen7 at Google Code:

Just wanted to point out that didjvu and img2djvu (http://code.google.com/p/didjvu/ and https://github.com/ashipunov/img2djvu), both of which appeared in the past couple of years, claim to have more sophisticated layer separation abilities. (For img2djvu, the brunt of the work is actually performed by another piece of software, "Scan Tailor," http://scantailor.sourceforge.net/ .) I haven't yet tested either of them but it may be worth checking to see whether ideas and/or code may be reusable.