Closed gamboz closed 6 years ago
Thanks for the report, I'm investigating your issue.
The issue is that our docx2hub module converts
<m:t xml:space="preserve">40 </m:t>
to
<mml:mn>40</mml:mn>
<mml:mi> </mml:mi>
I think the whitespace should be coded either as \
or \text{ }
Maybe we should convert an mi
that only contains (significant) whitespace to mtext
or mspace
. Then the TeX code will probably be ok.
There’s an mml-space-handling
option in docx2hub.xpl. We currently pass it only to the MathType converter. This option should eventually be passed to omml2mml.xsl, too (and acted upon accordingly).
But turning it into an mtext
for now is probably the quickest solution.
I resolved one issue, omml2mml.xsl converts the m:t
with whitespace now to
<mml:mn>40</mml:mn>
<mml:mtext xml:space="preserve"> </mml:mtext>
<mml:mtext>MHz</mml:mtext>
Unfortunately, there seems to be a bit of MathML normalization in our pipeline, which drops the mtext. I'll investigate this further.
I just noticed that mml-space-handling
is already being honored! If you just invoke docx2hub.xpl, the default setting of mspace
will kick in and the resulting expression will look like:
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline">
<mml:mn>40</mml:mn>
<mml:mspace width="0.25em"/>
<mml:mtext>MHz</mml:mtext>
</mml:math>
I haven’t tested it with the full docx2tex pipeline though.
This option was set to xml-space
, so <mml:mtext xml:space="preserve"> </mml:mtext>
should be the appropriate output with regard to this value. Unfortunately, after I've fixed the MathML normalization, there were lots of \text{}
environments in our test data. This is the case when authors work with text style in the equation editor when it is not necessary. To write 40 Mhz you do not need an equation editor at all.
However, I've changed mml-space-handling
from xml-space
to mspace
for docx2tex which results in less unintended text{}
environments, where authors wrote their equations sloppy. Finally, the equation now reads as follows:
detector at $40\:\mathrm{MHz}$, i.e.,
Hi, thank you for the fast solution.
I'm not sure if it is related to this issue, but since the last commit, my clone of docx2tex fails. I've also tried with a new pristine checkout. The errors are related to the conf.csv not validating and to a "Undeclared variable in XPath expression: $image-output-dir". The first error disappears if I specify the conf.xml file with the "-c" options of d2t
Please find the docx file and the d2t log here: https://medialab.sissa.it/owncloud/index.php/s/6I6rKxHflXeu3co
The bug is fixed, I've added an option recently to pass a custom image directory.
In the following docx file, the space between "40" and "MHz" is lost: https://medialab.sissa.it/owncloud/index.php/s/zkxFGDvNAehVatl I'm not sure if it is an error, but I'm reporting it because the appearance of the tex/pdf and docx file differ.