michal-h21 / make4ht

Build system for tex4ht
137 stars 15 forks source link

odt output for $ln x^*$ #54

Closed gl-utah closed 2 years ago

gl-utah commented 2 years ago

The LaTeX document

\documentclass{article}
\begin{document}
$\ln x^*$
\end{document}

processed with "make4ht -f odt filename" generates three LibreOffice "errors" ln ¿ x¿ ¿ while LaTeX generates, correctly, ln x* . I wonder if make4ht can be configured so that these errors do not occur? The attached file has screenshots and some more detail. ln_odt_verbose.docx

michal-h21 commented 2 years ago

I would say that these are issues in LibreOffice's math rendering, because MathML code that TeX4ht produces is valid, and it is displayed correctly in browsers (try `make4ht filename "mathml,mathjax" to see how it should look). I've found that the following configuration file works as a fix:

\Preamble{xhtml}
\catcode`\:=11
\Configure{MathClass}{1}{*}{<\a:mathml mi\Hnewline \mml:class="MathClass-op">}{</\a:mathml mi><mo> &\#x2061;<!--FUNCTION APPLICATION--></mo>}{}
\Configure{MathClass}{2}{*}{<\a:mathml mtext\Hnewline \mml:class="MathClass-op">}{</\a:mathml mtext>}{}
\catcode`\:=12
\begin{document}
\EndPreamble

You can try it using make4ht -c config.cfg -f odt filename.

I will add \Configure{MathClass}{1} to TeX4ht sources, as it works in both HTML and ODT formats. The second configuration is more complicated, because it changes output for a lot of commands and I don't think <mtext> element should be used for example \times. So I will probably not add this to TeX4ht sources.

gl-utah commented 2 years ago

I very much appreciate your help. Your solution works!

I see the drawback of this solution with the <mtext> element: it codes every binary operator as text instead. TeX is smart enough to change symbols which are usually type Binary into type Ordinary when type Binary is inappropriate (TeXBook Appendix G rule 5), but I guess LibreOffice can't do this.

A compromise would be to change "every binary operator appearing in a superscript or subscript" to type ordinary (or text). This would take care of my use case, which is common enough to be mentioned in the last answer at https://tex.stackexchange.com/questions/82155/star-vs-ast-in-formulas-which-one-to-use, where it is described as U+002A ASTERISK ("superscript-like"), in contrast to U+2217 ASTERISK OPERATOR. (It seems that the MathML code is utilizing this Asterisk Operator, which looks perfect even though the symbol shouldn't be an operator in this context.) This suggestion would also handle the chemical notation for ions, such as Cu++ or SO4--. And, in the cases when one really does want a binary operator in a subscript or superscript, treating the binary operator incorrectly as type Ordinary would result in mis-positionings so small they would be hard to see in the sub- or superscript. So is it possible write a configuration file which changes "every binary operator appearing in a superscript or subscript" into type Ordinary (or text)?

michal-h21 commented 2 years ago

I've found, that I should have used <mi> instead of <mtext>.

I think that we can convert all <mo> elements that are a single child of any element to <mi>, as I think in all cases where <mo> is used, it should be placed next to other elements on the same level. If it contains multiple <mo>, but nothing else, we must convert it to <mtext>. This is the case of Cu++. LO fails to render it correctly if there was <mi>++</mi>!

You can try this build file which does that, build.lua:

local domfilter = require "make4ht-domfilter"

local function just_operators(list)
  -- count <mo> and return true if list contains just them
  local mo = 0
  for _, x in ipairs(list) do
    if x:get_element_name() == "mo" then mo = mo + 1 end
  end
  return mo
end
local process = domfilter {
  function(dom)
    for _, x in ipairs(dom:query_selector("mo")) do
      local siblings = x:get_siblings()
      -- test if current element list contains only <mo>
      if just_operators(siblings) == #siblings then
        if #siblings == 1 then
          -- one <mo> translate to <mi>
          x._name = "mi"
          x:set_attribute("mathvariant", "normal")
        else
          -- multiple <mo> translate to <mtext>
          local text = {}
          for _, el in ipairs(siblings) do
            text[#text+1] = el:get_text()
          end
          -- replace first <mo> text with concetanated text content
          -- of all <mo> elements
          x._children = {}
          local text_el = x:create_text_node(table.concat(text))
          x:add_child_node(text_el)
          -- change <mo> to <mtext>
          x._name = "mtext"
          -- remove subsequent <mo>
          for i = 2, #siblings do
            siblings[i]:remove_node()
          end
        end
      end
    end
    return dom
  end
}

Make:match("4om$", process)
-- Make:match("html$", process)

Compile using make4ht -c config.cfg -e build.lua -f odt filename.

gl-utah commented 2 years ago

Thank you for working so much on this! Once I changed the first word in the build.lua file from "llocal" to "local" it worked wonderfully for LibreOffice. (Upon using LibreOffice to convert the .odt file to .docx, Microsoft Word unfortunately sets the "ln" in italics, but I guess this is a bug in Microsoft Word.) The config.cfg file I used was

\Preamble{xhtml}
\catcode`\:=11
\Configure{MathClass}{1}{*}{<\a:mathml mi\Hnewline \mml:class="MathClass-op">}{</\a:mathml mi><mo> &\#x2061;<!--FUNCTION APPLICATION--></mo>}{}
\catcode`\:=12
\begin{document}
\EndPreamble

In other words, I decided not include the line \Configure{MathClass}{2}{*}{<\a:mathml mtext\Hnewline \mml:class="MathClass-op">}{</\a:mathml mtext>}{} Did I make the right decision about which config.cfg file to use?

michal-h21 commented 2 years ago

Ah, you are right, there was a typo in the code. I've updated it and removed spurious l from local. Regarding MS Word rendering, I am not sure if the issue is caused by it, or LO export. You can try to open the ODT file directly in Word, to see how it handles it. In my test, Office 365 handled better the original issue with superscripts.

Regarding the config file, yes, you don't need to use \Configure{MathClass}{2}, as it would prevent the Lua code from fixing superscripts, and more importantly, it could result in the wrong rendering of operators that are used as operators.

gl-utah commented 2 years ago

I'm glad I understood which config.cfg file to use. When I open the ODT file directly in Word, Word correctly sets the "ln" in an upright font, solving that problem; but with this procedure, Word omits the superscript. In other words: LibreOffice export to DOCX: Word sets "ln" in the wrong font but shows the superscript. Word opening ODT file: Word sets the "ln" in the correct font but omits the * superscript.

But these are LO and/or Word problems, not problems with make4ht. I really appreciate your help with make4ht!

michal-h21 commented 2 years ago

Ah, that's bad. But I am not surprised, it is quite common that word processors have issues with math content which works without issues in Firefox/MathJax. And it is almost impossible to find what is the problem, so it can be a bit frustrating to debug it. I hope that these issues are not too serious for you.

gl-utah commented 2 years ago

Thank you so much! I have a few other problems but I'll open another issue for them.