nisaacson / pdf-extract

Node PDF Extract
MIT License
384 stars 76 forks source link

ghostscript vs imagemagick #10

Open rpedela opened 9 years ago

rpedela commented 9 years ago

By default, tesseract produces gibberish for me. I noticed that convert is commented out in favor of gs. I tried convert -depth 8 -background white -flatten -matte -density 300 <input> <output> instead and tesseract produced great results. The whole process was a lot faster too: ~15 minutes vs ~1 minute for 6 pages. I am curious why ghostscript is used rather than imagemagick for conversion?

sawyerh commented 9 years ago

+1 — Imagmagick produced better results for me as well, though it proved to be a pain in the ass to install locally.

nisaacson commented 8 years ago

Thanks for the suggestion. I will look into switching over to imagemagick

JustinElst commented 8 years ago

Any word on this?