transpect / docx2tex

Converts Microsoft Word docx to LaTeX
BSD 2-Clause "Simplified" License
523 stars 47 forks source link

Message: docx2hub error on unzipping. #1

Closed zopyx closed 9 years ago

zopyx commented 9 years ago

Message: docx2hub error on unzipping. Zip file seems to be corrupted: /infektionen-bei-haematologischen-und-onkologischen-patienten-uebersicht.docx (No such file or directory)

ERROR: err:XD0001:Only whitespace text nodes can appear at the top level in a document ERROR: err:XD0001:Only whitespace text nodes can appear at the top level in a document ERROR: err:XD0001:Only whitespace text nodes can appear at the top level in a document ERROR: It is a dynamic error if a non-XML resource is produced on a step output or arrives on a step input.

I can provide the sample file by email since Github does not support DOCX uploads.

The issue appears to be specific to MacOSX. Converting the same file on Linux works.

gimsieke commented 9 years ago

What’s the full path of the file? I isn’t /infektionen-bei-haematologischen-und-onkologischen-patienten-uebersicht.docx, is it?

zopyx commented 9 years ago

The problem seems to be that the conversion does not deal with relative paths properly. Using absolute paths seems to work.

gimsieke commented 9 years ago

It might try to resolve relative paths with with respect to the front end script’s directory. Can you disclose you setup? OS (= Mac OS X?), cwd, full invocation cmd line? If the issue it isn’t obvious to us, we’ll get hold of a Mac next week and reproduce&fix it. Thanks for reporting this issue!

zopyx commented 9 years ago

./d2t ../some.docx

on MacOSX caused on failure

and this failed on Linux

[ajung@dev1 docx2text]$ ./d2t lungenkarzinom-nicht-kleinzellig-nsclc.docx starting docx2tex Errors encountered while running docx2tex. Please see /home/ajung/src/docx2text/lungenkarzinom-nicht-kleinzellig-nsclc.log for details.

2015-06-19 14:24 GMT+02:00 Gerrit Imsieke notifications@github.com:

It might try to resolve relative paths with with respect to the front end script. Can you disclose you setup? OS (= Mac OS X?), cwd, full invocation cmd line? If the issue it isn’t obvious to us, we’ll get hold of a Mac next week and reproduce&fix it. Thanks for reporting this issue!

— Reply to this email directly or view it on GitHub https://github.com/transpect/docx2tex/issues/1#issuecomment-113498282.

gimsieke commented 9 years ago

Maybe related to the real_dir() bash function that Jim Fuller and I devised as a replacement for readlink -f that’s missing on vanilla Mac installations. But maybe also an error of the tr:file-uri step that @mkraetke forked from svn (also: letex:uri-composer()) where I committed some fixes in the meantime.

@mkraetke, please check whether the github version matches the current svn version.

@zopyx: We haven’t migrated every transpect related stuff to github yet. Some libraries currently exist on our svn and on github, some only on svn.

zopyx commented 9 years ago

That's why I am testing and poking you :-)

mkraetke commented 9 years ago

OK, I've tested with some DOCX files on OSX and they worked. Your file converts well on Windows and Linux, but I can reproduce the error with Mac OSX. I hope I can figure out the bug and fix it soon.

mkraetke commented 9 years ago

The bug is now fixed and was caused by an older bug in our URI resolver module, that was not yet fixed in our GitHub repository. Unfortunately, the GitHub repositories are currently a bit older than their SVN counterparts. Reminds me that we should move soon the other modules completely to GitHub.

Thank you for your valuable bug report!