pkp / ots

PKP XML Parsing Service
GNU General Public License v3.0
32 stars 19 forks source link

unoconv periodically dies and doesn't restart gracefully #49

Closed axfelix closed 8 years ago

axfelix commented 8 years ago

Still need to figure out how best to handle this -- they aren't hangs per se so we can't just check for and terminate unusually long process execution -- but after big corpus runs I occasionally find that all doc input is stuck because unoconv can't convert.

axfelix commented 8 years ago

Seems a little happier if unoconv --listener & is run and sent to background when the queues are started, as this means that it has a permanent "listener" and doesn't have to do as much re-jigging of LibreOffice. Seeing if this works over a period of testing then I'll add it to the queue startup script.

axfelix commented 8 years ago

Running a listener didn't seem to help, but I noticed we were behind on unoconv and libreoffice versions, so I just bumped 0.6 -> 0.7 and 4.3 -> 5.0, so we'll see if that has an effect.

axfelix commented 8 years ago

Seems to be working with the version bump. Closing this for now.

axfelix commented 8 years ago

Reopening this because it's still causing problems. I'm beginning to suspect that the issue is caused by meTypeset spawning unoconv processes to handle documents containing images in .wmf format, since when last we checked those couldn't be converted by imagemagick due to a breakage, and the server can't cope well with multiple libreoffice processes trying to run simultaneously (because the meTypeset-spwaned processes are functionally overriding our queueing).

I'll see if there are upstream fixes to imagemagick...

axfelix commented 8 years ago

Nope, libwmf is still basically orphaned, which means unoconv will likely be required for .wmf images from now on (and unfortunately, this is the format that images contained in .docx zips are stored in by default, when .doc is up-converted to .docx). I wonder if I can get the LibreOffice devs to change that ... doubt it but I'll open a bug anyway. Alternative is trying to do something more sysadminny on our end to make sure no LO process ever tries to start over another one.

axfelix commented 8 years ago

Oh hell https://bugs.documentfoundation.org/buglist.cgi?quicksearch=wmf

axfelix commented 8 years ago

https://bugs.documentfoundation.org/show_bug.cgi?id=97441

axfelix commented 8 years ago

This is causing us to fail a lot more .doc/.docx inputs than we should, just due to queueing mishaps, though it's typically only an issue during big corpus runs.

axfelix commented 8 years ago

Closing this again as updates to LO/unoconv seemed to have fixed it to a point where it doesn't trigger consistently and we're seeing more inconsistent hangs on other parts of the stack: https://github.com/pkp/xmlps/issues/57