roberttdev / dactyl4

DACTYL in Rails 4
MIT License
1 stars 0 forks source link

PDF to gif Conversion May Stop During Conversion #82

Open wsloand opened 9 years ago

wsloand commented 9 years ago

Document 49 did not convert accurately from pdf to gifs. The errors are evident at the bottom of pages 3, 6, 8, and maybe more being truncated.

I'm guessing that this is an error in the underlying conversion library and not directly an error with DACTYL.

roberttdev commented 9 years ago

If you delete the file and re-upload, is there still a problem? There's nothing in the logs (although it was so long ago I may be overlooking something that was different between then and now), so I suspect the little machine we're using maxed out when handling all the files that were uploaded that day. I can't confirm, though, because CPU/Mem usage graphs from that day are long gone by now.

wsloand commented 9 years ago

I just tried re-upload (new document is number 68), and it had the same result. It appears to happen at a Greek character mu on page 3 (my guess as to the source of the error). On pages 6, 7, and 8, it stops at an infinity character.

wsloand commented 9 years ago

With a bit more investigation, this appears to be an issue in graphicsmagick (and imagemagick). When I manually tried conversion with the same command line used by graphicsmagick, I got the following error (I'll follow-up with them (correction, the issues goes deeper, to GhostScript)). Can we capture the error so that the file doesn't upload when graphicsmagick has an error so that the problem isn't silent in the future?

$ gm convert +adjoin -define pdf:use-cropbox=true -limit memory 256MiB -limit map 512MiB -density 150 -quality 100 nihms513647.pdf[2] nihms513647_2.png ** Error reading a content stream. The page may be incomplete. ** File did not complete the page properly and may be damaged.

** This file had errors that were repaired or ignored. ** The file was produced by: ** >>>> Antenna House PDF Output Library 2.6.0 (Linux) <<<< ** Please notify the author of the software that produced this ** file that it does not conform to Adobe's published PDF ** specification.

wsloand commented 9 years ago

One additional note: In additional testing, I can't convert the file accurately in any way that depends on ghostscript (including imagemagick or graphicsmagick). I can with mupdf.

roberttdev commented 9 years ago

No luck Googling anything about that error or Graphicsmagick / Antenna House failing on infinity symbols. Can't even really find anything about the relationship between Graphicsmagick and Antenna House and whether that library is exchangable for a better one. How did you follow up with them? I don't want to duplicate effort.

On Wed, Mar 18, 2015 at 6:01 PM, wsloand notifications@github.com wrote:

One additional note: In additional testing, I can't convert the file accurately in any way that depends on ghostscript (including imagemagick or graphicsmagick). I can with mupdf.

— Reply to this email directly or view it on GitHub https://github.com/roberttdev/dactyl4/issues/82#issuecomment-83200332.

wsloand commented 9 years ago

I followed up with a bug report to Ghostscript (http://bugs.ghostscript.com/show_bug.cgi?id=695874). I think that the right fix may be to substitute the graphicsmagick command for mupdf (more accurately mudraw) in DocSplit. I think that mudraw may be both resource-lighter and since it's not reliant on the same pdf parser, it will work differently (and hopefully better).

Update: The ghostscript developers indicate that it's not a bug... continuing to chat with them in that bug report to find out the right way to make it work.

roberttdev commented 9 years ago

Wow, those guys respond pretty fast. Let me know if the latest comment he replied with works. If so, hopefully we can drop those commands into a GhostScript config file (or pass them along through the Graphicsmagick command called by DC).

Seems like a pretty big issue, I wonder if they have the ability to push for an update to the dependencies in Ubuntu's repos. Although they'll have to track down a package to be the dependency first..

On Thu, Mar 19, 2015 at 12:03 AM, wsloand notifications@github.com wrote:

I followed up with a bug report to Ghostscript ( http://bugs.ghostscript.com/show_bug.cgi?id=695874). I think that the right fix may be to substitute the graphicsmagick command for mupdf (more accurately mudraw) in DocSplit. I think that mudraw may be both resource-lighter and since it's not reliant on the same pdf parser, it will work differently (and hopefully better).

On Mar 18, 2015, at 21:33, Rob Williams notifications@github.com wrote:

No luck Googling anything about that error or Graphicsmagick / Antenna House failing on infinity symbols. Can't even really find anything about the relationship between Graphicsmagick and Antenna House and whether that library is exchangable for a better one. How did you follow up with them? I don't want to duplicate effort.

On Wed, Mar 18, 2015 at 6:01 PM, wsloand notifications@github.com wrote:

One additional note: In additional testing, I can't convert the file accurately in any way that depends on ghostscript (including imagemagick or graphicsmagick). I can with mupdf.

— Reply to this email directly or view it on GitHub https://github.com/roberttdev/dactyl4/issues/82#issuecomment-83200332.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/roberttdev/dactyl4/issues/82#issuecomment-83309168.

wsloand commented 9 years ago

They are very responsive. I tried their suggestion on my home linux box, and it didn't work for me. I've added more detail, and hopefully they will have an additional suggestion. It does seem like the fix will eventually need to go into either the Ubuntu or Debian repository (our little DACTYL is growing up).

wsloand commented 9 years ago

It looks like the final solution will be that the ghostscript guys will talk to the ubuntu package maintainer and get it fixed. I think our better solution will be to move to mupdf.

roberttdev commented 9 years ago

I assume there's no rush, right? I'm thinking if we can resolve this with a simple 'apt-get update' once the Ubuntu maintainer fixes it in < 3 months, that's the best scenario. If progress stalls out, then we put in the medium effort to swap PDF packages in a few months. Or is there other functionality in mupdf you'd like to leverage?

On Fri, Mar 20, 2015 at 11:19 AM, wsloand notifications@github.com wrote:

It looks like the final solution will be that the ghostscript guys will talk to the ubuntu package maintainer and get it fixed. I think our better solution will be to move to mupdf.

— Reply to this email directly or view it on GitHub https://github.com/roberttdev/dactyl4/issues/82#issuecomment-84046509.

wsloand commented 9 years ago

Yeah, it's not a rush. All I'm wanting is error-free (or error-caught) pdf to image conversion. There are other things that I'm wanting to work on with the image conversion (like issue #80 ), so we can bundle these together and look at them later.

wsloand commented 9 years ago

It is now also an Ubuntu bug: https://bugs.launchpad.net/bugs/1438494

roberttdev commented 9 years ago

Getting operating systems modified.. we're in the Big Time :)

On Wed, Apr 1, 2015 at 8:35 PM, wsloand notifications@github.com wrote:

It is now also an Ubuntu bug: https://bugs.launchpad.net/bugs/1438494

— Reply to this email directly or view it on GitHub https://github.com/roberttdev/dactyl4/issues/82#issuecomment-88674853.