transpect / docx2tex

Converts Microsoft Word docx to LaTeX
BSD 2-Clause "Simplified" License
523 stars 47 forks source link

Problems with Endnote references #4

Open j3mdamas opened 9 years ago

j3mdamas commented 9 years ago

I have a document where I used Endnote to manage the references. The file is the same of #3.

What happens is that the superscripted numbers in the main text corresponding to the references are all replaced by \href{}{}, which causes the resulting pdf to show nothing instead of the superscripted numbers.

gimsieke commented 9 years ago

Yes, technically they are created as hyperlinks within the document. I think we should generate at least \label and \ref when dealing with internal links. But this won’t help in this case. Ideally we’d parse the enclosed Endnote XML and convert it to bibtex. Parsing escaped XML requires a commercial Saxon XSLT processor that is not included in the demo. We might offer this as a commercial service later. For the time being, I think we can recognize the special type of instruction text that Endnote creates within these hyperlinks. It looks like <w:instrText xml:space="preserve"> ADDIN EN.CITE. Then we can convert the links to \cite and group the paragraphs that they link to as \bibitems in a bibliography environment. I can’t tell you, however, when we’ll find the time to implement this.

j3mdamas commented 9 years ago

Thanks for the quick reply! Yes, that would be ideal :-) Is there a way to quickly hack the program to behave like codeplex's doc2tex (which does not care about Endnote and uses the explicit superscripted numbers)?

j3mdamas commented 9 years ago

P.S: I feel like you are at the brink of telling me "If you like the code in codeplex so much, why don't you use it instead?" :-) But I am trying to have a more recent, supported option, rather than rely on an unsupported (?) code. Once again, thanks for you help!

gimsieke commented 9 years ago

Yes, only rendering the superscripts (i.e., the link text) should require a bit less work than the \cite/\bibitem processing, which in turn requires significantly less work than the full Endnote→bibtex conversion. I’m afraid this weekend will be the earliest time that we can look into implementing some solution or even a hack.

Regarding your other comment: Since our goal is to achieve world domination in the field of XML-based file format conversion&checking tools, there’s no problem in challenging our current attempts instead of reverting to other solutions.

We are interested in improving these tools, and we appreciate your feedback.

j3mdamas commented 9 years ago

Sure, take your time. Let me know about it. And I am glad I can help then :-)

gimsieke commented 5 years ago

Since escaped XML can now be parsed with the open-source Saxon product (using the XPath 3.1 function parse-xml()), handling Endnote references can now be incorporated into docx2tex, without the need to supply a Saxon license key. Unfortunately, I haven’t kept the file that you sent for #3, therefore I don’t have an Endnote code example to estimate the amount of work necessary. Can you re-send it? We will probably implement Endnote→BibTeX conversion if someone pays us to do so (see #39) or if we need to handle Endnote references in our own production.