matecat / MateCat-Filters

Convert any file to XLIFF and back with perfectly preserved formatting! Super easy API, plenty of supported formats and advanced segmentation.
http://filters.matecat.com
GNU Lesser General Public License v3.0
45 stars 32 forks source link

docx: Error opening zipped input file #10

Closed mxposed closed 8 years ago

mxposed commented 8 years ago

Hi!

I have launched filters using this instruction https://github.com/matecat/MateCat-Filters/wiki/Build-and-run And I get the following error when trying to convert .docx document. Do I get it right, that .docx should be supported by filters? How do I debug such an error?

ERROR qtp1844169442-83 [CONVERSION REQUEST FAILED] Error opening zipped input file. net.sf.okapi.common.exceptions.OkapiIOException: Error opening zipped input file. at net.sf.okapi.filters.openxml.OpenXMLFilter.openZipFile(OpenXMLFilter.java:436) at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:260) at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:140) at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:123) at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:235) at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:205) at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:186) at com.matecat.converter.core.okapiclient.OkapiClient.generatePack(OkapiClient.java:331) at com.matecat.filters.basefilters.DefaultFilter.extractOkapiPack(DefaultFilter.java:56) at com.matecat.filters.basefilters.DefaultFilter.extract(DefaultFilter.java:35) at com.matecat.filters.basefilters.FiltersRouter.extract(FiltersRouter.java:29) at com.matecat.converter.server.resources.ConvertToXliffResource.convert(ConvertToXliffResource.java:90) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:308) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:291) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1140) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:403) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:816) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1113) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1047) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119) at org.eclipse.jetty.server.Server.handle(Server.java:517) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:302) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:238) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:57) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745)

LLCampos commented 8 years ago

I've had the same problem. In my case, using Python to make the requests to Filters, I was opening the DOCX file in "r" mode, but it should be in "rb" mode. If you want, I can elaborate if you think that's going to help to fix your problem.

giusilvano commented 8 years ago

It seems that Filters received a corrupted file. DOCX are actually zip archives with changed extension. The suggestion from LLCampos is correct, maybe you are reading the file in the wrong way. Try to convert the file using the web interface (default localhost:8732), if it works then you can be sure that the problem is just in the code you use to send the file.

mxposed commented 8 years ago

@giusilvano You are right, the file gets converted ok directly through the filters web interface. I use a separate matecat installation to send the file to filters. Do you know if they are supposed to live on the same machine? And where better to ask questions about it? Thanks!

giusilvano commented 8 years ago

Matecat.com and our running Filters instances are hosted on different machines in our infrastructure, so the problem has to be elsewhere. I can't figure out clearly what is happening in your environment. Try to check what your Filters are receiving (the received file is written in a temp directory), then check if there are problems with curl or PHP versions, or networking issues (firewalls, proxies etc.).

giusilvano commented 8 years ago

I close the issue for inactivity. @mxposed feel free to add details if you find something that should be fixed our side.

mxposed commented 8 years ago

@giusilvano I figured that out: I had PHP 5.6 and an old version of matecat. I pulled the new master and this commit fixed the issue. https://github.com/matecat/MateCat/commit/a44d76e84d01bf1a1552b0d650ce34b1a50646f9 Thank you

giusilvano commented 8 years ago

@mxposed happy to hear that you solved the issue, and that it was not caused by a bug in Filters! :D Bye!