michal-h21 / make4ht

Build system for tex4ht
131 stars 15 forks source link

Make:match fails if there is a footnote #145

Open u-fischer opened 4 months ago

u-fischer commented 4 months ago

I'm trying to extract all the mathml into an extra file. Based on https://chat.stackexchange.com/transcript/41?m=65070731#65070731 I tried with this extract-math.mk4:

local domfilter = require "make4ht-domfilter"

local process = domfilter {"mathmlfixes", -- fix mathml first
   function(dom, par)
      local filename = par.input .. "-mathml.mml"
      local f = io.open(filename, "w")
    for count, math in ipairs(dom:query_selector "math") do
      f:write("\n"..count.."\n")
      f:write(math:serialize())
    end
    f:close()
    return dom
  end
}

Make:match("html", process)

I then call it for test-utf8.tex with make4ht -l -e extract-math.mk4 test-utf8 "mathml"

This works fine for

\documentclass{article}
\begin{document}
$a=b$ and $x=y$

%a\footnote{blub}

\end{document}

and the files contains

1
<math display='inline' xmlns='http://www.w3.org/1998/Math/MathML'><mrow><mi>a</mi> <mo class='MathClass-rel' stretchy='false'>=</mo> <mi>b</mi></mrow></math>
2
<math display='inline' xmlns='http://www.w3.org/1998/Math/MathML'><mrow><mi>x</mi> <mo class='MathClass-rel' stretchy='false'>=</mo> <mi>y</mi></mrow></math>

But as soon as I uncomment the footnote in the example above the file is empty.

michal-h21 commented 4 months ago

The build file expects only one output file, but with footnotes, we get separate HTML file for each footnote by default. So each footnote overwrites the .mml file. Footnotes can be put to the main HTML file using the "fn-in" option, but it can be a good idea to support multiple files anyway.

This build file appends MathML code from all generated HTML files:

local domfilter = require "make4ht-domfilter"
local mkutils = require "mkutils"

local process = domfilter {"mathmlfixes", -- fix mathml first
  function(dom, par)
    -- if we output to several HTML files, we want to overwrite the mml file in the first processed
    -- file, in the following files, we will append MathML to the already existing file
    local current_name = mkutils.remove_extension(par.filename)
    local mode = "a"
    if current_name == par.input then mode = "w" end
    local filename = par.input .. "-mathml.mml"
    local f = io.open(filename, mode)
    for count, math in ipairs(dom:query_selector "math") do
      f:write("\n"..count.."\n")
      f:write(math:serialize())
    end
    f:close()
    return dom
  end
 }

Make:match("html", process)
u-fischer commented 2 months ago

Sorry for the late feedback, I got busy and it slipped my mind. The suggested change works fine, but sadly I can't use it for the intended use case as tex4ht doesn't work with the latex-lab code ;-(.

michal-h21 commented 2 months ago

How can I try the latex-lab code? What is broken?

u-fischer commented 2 months ago

Well basically I'm trying to inject the hash we calculate into the output, but tex4ht disables the latex-lab-code (as it ignores \DocumentMetadata) and if I load it manually it errors. As an example:

\RequirePackage{tagpdf-base}
\RequirePackage{latex-lab-testphase-math}
\documentclass{article}

\AtBeginDocument{\Configure{math-xmlns}
  {xmlns="http://www.w3.org/1998/Math/MathML" hash="abc" source="blub"}}

\begin{document}

$a=1$ and 

$x=2$

a\footnote{blub $f = 3$}

\end{document}

compiled with make4ht test "mathml" gives

! You can't use `\unless' before `\relax'.
<to be read again> 
\ifmeasuring@ 
l.1468 ...64748494A4B4C4D4E4F505152535455565758595A}

and various follow up errors.

michal-h21 commented 2 months ago

I see. I was able to limit the number of errors to just one, with these config files.

usepackage.4ht:

% usepackage.4ht (2024-04-18-14:01), generated from tex4ht-4ht.tex
% Copyright 2003-2009 Eitan M. Gurari
% Copyright 2009-2024 TeX Users Group
%
% This work may be distributed and/or modified under the
% conditions of the LaTeX Project Public License, either
% version 1.3c of this license or (at your option) any
% later version. The latest version of this license is in
%   http://www.latex-project.org/lppl.txt
% and version 1.3c or later is part of all distributions
% of LaTeX version 2005/12/01 or later.
%
% This work has the LPPL maintenance status "maintained".
%
% The Current Maintainer of this work
% is the TeX4ht Project <http://tug.org/tex4ht>.
%
% If you modify this program, changing the
% version identification would be appreciated.
\immediate\write-1{version 2024-04-18-14:01}

   \def\:temp{tex4ht}\ifx \:temp\@currname
   \:warning{\string\usepackage{tex4ht} again?}
   \def\:temp#1htex4ht.def,tex4ht.sty#2!*?: {\def\:temp{#2}}
\expandafter\:temp \@filelist htex4ht.def,tex4ht.sty!*?: %
\ifx \:temp\empty  \else
   \:warning{if
    \string\RequirePackage[tex4ht]{hyperref} or
    \string\usepackage[tex4ht]{hyperref} was
    used try instead, repectively,
    \string\RequirePackage{hyperref} or
    \string\usepackage{hyperref}}
\fi

\fi
\gdef\a:usepackage{\use:package ,!*?: }
\gdef\use:package#1,{%
   \if :#1:\def\:temp##1!*?: {}\else
      \def\:temp{#1}\ifx \@currname\:temp
             \def\:temp##1!*?: {\input usepackage.4ht  }%
      \else \let\:temp=\use:package \fi
   \fi \:temp}
\Configure{PackageHooks}{titlesec.sty}{titlesec-hooks.4ht}
\Configure{PackageHooks}{multibib.sty}{multibib-hooks.4ht}
\Configure{PackageHooks}{biblatex-chicago.sty}{biblatex-chicago-hooks.4ht}
\Configure{PackageHooks}{cleveref.sty}{cleveref-hooks.4ht}
\Configure{PackageHooks}{xr.sty}{xr-hooks.4ht}
\Configure{PackageHooks}{xr-hyper.sty}{xrhyper-hooks.4ht}
\Configure{PackageHooks}{eso-pic.sty}{esopic-hooks.4ht}
\Configure{PackageHooks}{showframe.sty}{showframe-hooks.4ht}
\Configure{PackageHooks}{expl3.sty}{expl3-hooks.4ht}
\Configure{PackageHooks}{savetrees.sty}{savetrees-hooks.4ht}
\Configure{PackageHooks}{newcomputermodern.sty}{newcomputermodern-hooks.4ht}
\Configure{PackageHooks}{newcomputermodern.sty}{newcomputermodern-hooks.4ht}
\Configure{PackageHooks}{fontawesome5-utex-helper.sty}%
{fontawesome5-utex-helper-hooks.4ht}
\Configure{PackageHooks}{fontawesome5.sty}{fontawesome5-hooks.4ht}
\Configure{PackageHooks}{biblatex.sty}{biblatex-hooks.4ht}
\Configure{PackageHooks}{xeCJK.sty}{xecjk-hooks.4ht}
\Configure{PackageHooks}{unicode-math.sty}{unicode-math-hooks.4ht}
\Configure{PackageHooks}{ctex.sty}{ctex-hooks.4ht}
\AddToHook{class/ctexart/before}{\input{ctexart-hooks.4ht}}
\Configure{PackageHooks}{luatexja.sty}{luatexja-hooks.4ht}
\Configure{PackageHooks}{luatexja-fontspec.sty}{luatexja-hooks.4ht}
\Configure{PackageHooks}{polyglossia.sty}{polyglossia-hooks.4ht}
\Configure{PackageHooks}{fontspec.sty}{fontspec-hooks.4ht}
\Configure{PackageHooks}{tikz.sty}{tikz-hooks.4ht}
\Configure{PackageHooks}{pgf.sty}{pgf-hooks.4ht}
\Configure{PackageHooks}{pdfbase.sty}{pdfbase-hooks.4ht}
\Configure{PackageHooks}{pdfx.sty}{pdfx-hooks.4ht}
\Configure{PackageHooks}{lua-widow-control.sty}{lua-widow-control-hooks.4ht}
\Configure{PackageHooks}{tagpdf.sty}{tagpdf-hooks.4ht}
\Configure{PackageHooks}{accessibility.sty}{accessibility-hooks.4ht}
\Configure{PackageHooks}{embedfile.sty}{embedfile-hooks.4ht}
\Configure{PackageHooks}{breakurl.sty}{breakurl-hooks.4ht}
\Configure{PackageHooks}{hyperref.sty}{hyperref-hooks.4ht}
\Configure{PackageHooks}{bookmark.sty}{bookmark-hooks.4ht}
\Configure{PackageHooks}{draftwatermark.sty}{draftwatermark-hooks.4ht}
\AddToHook{package/tabu/before}{\RequirePackage{tabularx}}
\Configure{PackageHooks}{caption.sty}{caption-hooks.4ht}
\Configure{PackageHooks}{footnotebackref.sty}{footnotebackref-hooks.4ht}
\AddToHook{package/doc/before}{\SUPOff}
\AddToHook{package/doc/after}{\SUPOn}
\AddToHook{package/hypdoc/before}{\SUPOff}
\AddToHook{package/hypdoc/after}{\SUPOn}
\Configure{PackageHooks}{mathtools.sty}{mathtools-hooks.4ht}
\Configure{PackageHooks}{babel.sty}{babel-sty-hooks.4ht}
\Configure{PackageHooks}{minted.sty}{minted-sty-hooks.4ht}
\Configure{PackageHooks}{xyling.sty}{xyling-hooks.4ht}
\Configure{PackageHooks}{graphics.sty}{graphics-hooks.4ht}
\Configure{PackageHooks}{graphbox.sty}{graphbox-hooks.4ht}
\Configure{PackageHooks}{xcolor.sty}{xcolor-hooks.4ht}
\Configure{PackageHooks}{imakeidx.sty}{imakeidx-hooks.4ht}
\Configure{PackageHooks}{fancyhdr.sty}{fancyhdr-hooks.4ht}
\Configure{PackageHooks}{exerquiz.sty}{exerquiz-hooks.4ht}
\Configure{PackageHooks}{hyperxmp.sty}{hyperxmp-hooks.4ht}
\Configure{PackageHooks}{datetime2.sty}{datetime2-hooks.4ht}
\Configure{PackageHooks}{latex-lab-testphase-math.sty}{latex-lab-testphase-math-hooks.4ht}

\endinput

It just registers the following file latex-lab-testphase-math-hooks.4ht, to be loaded once the latex-lab-testphase-math package is loaded:

\ExplSyntaxOn
\:AtEndOfPackage{
\RequirePackage{amsmath}
 \cs_set_protected:Npn \__tag_whatsits: {} 
}
\ExplSyntaxOff

Just requiring amsmath before \begin{document} fixed most errors. The rest was fixed by redefinition of \__tag_whatsits:, except this one:

  ! LaTeX Error: Control sequence \__tag_whatsits: already defined.

The resulting HTML code looks fine, math has the source attribute.

u-fischer commented 2 months ago

OK, loading amsmath earlier clearly helped ;-). That is something that we could imho do rather easily in the latex-lab code.

With all these changes this here compiles with make4ht

\DocumentMetadata{testphase={phase-III,math}}
\RequirePackage{amsmath}
\documentclass[12pt]{article}
\ExplSyntaxOn
\socket_new_plug:nnn{tagsupport/math/inline/formula/begin}{make4ht}
 {#1\tl_show:e{???\detokenize{#1}???}}
\cs_if_exist:NT\HCode 
 {\AssignSocketPlug{tagsupport/math/inline/formula/begin}{make4ht}}
\ExplSyntaxOff  
\DebugSocketsOn
\begin{document}
some math $a=\int f(x)$ 
\end{document}

But if I add a display math or an amsmath environments like an align it errors.

! Extra }, or forgotten $.
\endequation* ...:endequation*:\endcsname \egroup 
                                                  \csname b:equation*\endcsn...
l.16 \[a=\int f(x)\]

A second problem is how to access the math content. If I use make4ht -l test, then the socket shows ???a=\int f(x)??? in the log and that is fine. But with make4ht -l test "mathml" I get something like this (besides of lots other math grabbing output):

 ???\aftergroup \b:mth \c:mth \fi \bool_if:NF \l__math_collected_bool
{\bool_set_true:N \l__math_collected_bool \__math_grab_dollar:w }a=\int
f(x)???.

And it is not trivial to get the real math content from it. I wonder is make4ht could not make use of the math grabbing code if it is there instead of patching the math?

michal-h21 commented 2 months ago

Instead of redefining of mkparams.lua, you can add the following line to the .mk4 file to request lualatex-dev engine:

Make:htlatex {htlatex = "dvilualatex-dev"} 

The htlatex option can take any value, as long as it produces a DVI file. You can put multiple Make:htlatex calls to require multiple compilations, but I don't think it is necessary in this case.

I will add declaration of \__tag_whatsits: {}, but I am not sure about \DocumentMetadata, resp packages loaded by phase-III-latex-lab-testphase.ltx. Most of command and environment redefinitions are done in the begindocument/before hook, but this happens after original commands were redefined in .4ht files. These are loaded just before \begin{document}. So all hooks for insertion of HTML tags are lost in commands and environments redefined in these packages. This happens for example to footnotes, itemize, etc.

We would need to redefine them again in phase-III-latex-lab-testphase.4ht, if we used \AtBeginDocument:

\AtBeginDocument{
\catcode`\:=11
\makeatletter
%redefinitions here
\catcode`\:=12
\makeatother
}

I even tried to \input{latex.4ht} and other basic files here, but it only led to errors, and itemized or footnotes didn't work anyway. So I am not sure what is a good solution here :/

u-fischer commented 2 months ago

Instead of redefining of mkparams.lua, you can add the following line to the .mk4 file to request lualatex-dev engine:

You mean the mk4 for the extraction of the mathml? My remark was more on the general side: imho all users should be able to test with an upcoming latex.

I will add declaration of __tag_whatsits: {}

That should not be needed, the next tagpdf update will handle that.

but I am not sure about \DocumentMetadata, resp packages loaded by phase-III-latex-lab-testphase.ltx. Most of command and environment redefinitions are done in the begindocument/before hook, but this happens after original commands were redefined in .4ht files.

I quite understand that there will be problems. But we are putting all this new code in testphase packages and latex-lab style files so that we can test it and identify problems and then find suitable solutions. All this is not possible with tex4ht if you simply disable \DocumentMetadata and the loading of the latex-lab code. The main goal of the code currently is to enable tagging but we also have in mind to simplify the html output, after all the structures are quite similar, and for this it is important to understand if and how the code that we add can be reused by tex4ht and others.

So please enable \DocumentMetadata, and if you see an error, report it at the tagging-project github.

michal-h21 commented 2 months ago

I quite understand that there will be problems. But we are putting all this new code in testphase packages and latex-lab style files so that we can test it and identify problems and then find suitable solutions. All this is not possible with tex4ht if you simply disable \DocumentMetadata and the loading of the latex-lab code. The main goal of the code currently is to enable tagging but we also have in mind to simplify the html output, after all the structures are quite similar, and for this it is important to understand if and how the code that we add can be reused by tex4ht and others.

So please enable \DocumentMetadata, and if you see an error, report it at the tagging-project github.

OK, I've enabled \DocumentMetadata in the TeX4ht sources. It doesn't cause fatal errors anymore, just the clash between macros redefined by both TeX4ht and tagpdf. Hopefully, we will be able to fix that.