michal-h21 / make4ht

Build system for tex4ht
132 stars 15 forks source link

domfilter error with graphics using subfile package: "Unbalanced Tag (/p)" #84

Closed alder711 closed 1 year ago

alder711 commented 1 year ago

Hi, Michal.

I am seeing an issue when I use make4ht to compile LaTeX into HTML, but the issue seems to only happen when I use the subfiles package.

Content of main.tex:

\documentclass{article}
        \usepackage{graphicx}
        \usepackage{float}
        \graphicspath{{../img/}}
        \usepackage{subfiles}
\begin{document}
        \subfile{sections/section1}
\end{document}

Content of sections/section1.tex:

\documentclass[../main.tex]{subfiles}
\graphicspath{{\subfix{../../img/}}}
\begin{document}
        \begin{figure}[H]
                \centering
                \includegraphics[width=0.5\textwidth]{sample.png}
                \label{fig:sec1-img1}
                \caption{Test Image}
        \end{figure}
\end{document}

Content of my.cfg:

...
\ConfigureEnv{figure}
{\IgnorePar\EndP\HCode{<figure class="figure">\Hnewline}%
    \bgroup \Configure{float}{\ShowPar}{}{}%
   }
   {\egroup
   \IgnorePar\EndP\HCode{</figure>}\ShowPar
\par}
   {}{}
...

And if I compile the above with make4ht --format html5+latexmk_build --backend tex4ht --loglevel info --shell-escape --config my.cfg main.tex, Everything seems fine, except for the following warning:

[WARNING] domfilter: DOM parsing of main.html failed:
[WARNING] domfilter: /usr/share/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Unbalanced Tag (/p) [char=559]

And I see the following in the generated main.html:

...
<p class="noindent">   <a 
 id="x1-2r1"></a><figure class="float" 
>

 <img 
src="../img/../img//sample.png" alt="pict"  
 width="172.5pt" > <a 
 id="x1-1doc"></a>
<figcaption class="caption" ><span class="id">Figure 1: </span><span  
class="content">Test Image</span></figcaption><!--tex4ht:label?: x1-2r1 -->

   </p></figure>
...

(note the switched </figure> and </p> tags)

This is interesting, because if I change main.tex to not include sections/section1.tex and instead specify the graphics there in main.tex like so:

\documentclass{article}
        \usepackage{graphicx}
        \usepackage{float}
        \graphicspath{{../img/}}
        \usepackage{subfiles}
\begin{document}
        \begin{figure}[H]
                \centering
                \includegraphics[width=0.5\textwidth]{sample.png}
                \label{fig:sec1-img1}
                \caption{Test Image}
        \end{figure}
\end{document}

no warnings or errors are seen, and the generated main.html includes:

...
<figure class='figure' id='-test-image'> 

<a id='x1-2r1'></a>

<p class='noindent'> <img alt='pict' src='../img//sample.png' width='172.5pt' /> <a id='x1-1doc'></a>
<figcaption class='caption'><span class='id'>Figure 1: </span><span class='content'>Test Image</span></figcaption><!-- tex4ht:label?: x1-2r1  -->

   </p></figure>
...

(note </figure> and </p> are correctly positioned)

Used software versions:

I am still figuring out how to use make4ht and would love any help with this. Thanks for all your hard work!

michal-h21 commented 1 year ago

Hi Trevor, I cannot reproduce your exact issue, I don't get the DOM error. But I can see that the configuration for the figure environment doesn't affect the included file. I will need to investigate it more, but it is possible that some configurations are redefined when this command is used.

michal-h21 commented 1 year ago

Ah, I found that \ConfigureEnv doesn't work in the included document, so lot of other environments are broken.

alder711 commented 1 year ago

Ah, so you are saying that at the moment, any \ConfigureEnv configurations in my.cfg are for some reason not being applied to subfiles included in main.tex?

michal-h21 commented 1 year ago

Yes, it seems so. I will need to figure out how to fix that, it shouldn't be hard once I find the cause. I suspect that \begin and \end commands are redefined in \subfile, so the TeX4ht hooks are not executed.

alder711 commented 1 year ago

That makes sense, as I see something similar when using listings environments in an included file where I get an undefined control sequence error, but only when the environment is in the included file.

michal-h21 commented 1 year ago

I've found a solution. This version of subfiles.4ht should work:

% subfiles.4ht (2022-04-04-07:06), generated from tex4ht-4ht.tex
% Copyright 2022 TeX Users Group
%
% This work may be distributed and/or modified under the
% conditions of the LaTeX Project Public License, either
% version 1.3c of this license or (at your option) any
% later version. The latest version of this license is in
%   http://www.latex-project.org/lppl.txt
% and version 1.3c or later is part of all distributions
% of LaTeX version 2005/12/01 or later.
%
% This work has the LPPL maintenance status "maintained".
%
% The Current Maintainer of this work
% is the TeX4ht Project <http://tug.org/tex4ht>.
%
% If you modify this program, changing the
% version identification would be appreciated.
\immediate\write-1{version 2022-04-04-07:06}

\let\subfiles:origend\end
\def\subfiles:end{%
  \def\:temp{document}
  \ifx\@currenvir\:temp
    \let\choose:begin\@secondoftwo%
    \def\:temp##1{}
    \subfiles@restoreEndFrom\:temp
  \fi%
}
\def\:tempa#1{%
  \ifcsname subfiles@end\endcsname
  \else
    \subfiles@saveEndTo\subfiles@end
  \fi
  \pend:defI\end\subfiles:end
}

\HLet\subfiles@renewEndDocument\:tempa

\Hinput{subfiles}
\endinput
alder711 commented 1 year ago

I've found a solution. This version of subfiles.4ht should work:

% subfiles.4ht (2022-04-04-07:06), generated from tex4ht-4ht.tex
% Copyright 2022 TeX Users Group
%
% This work may be distributed and/or modified under the
% conditions of the LaTeX Project Public License, either
% version 1.3c of this license or (at your option) any
% later version. The latest version of this license is in
%   http://www.latex-project.org/lppl.txt
% and version 1.3c or later is part of all distributions
% of LaTeX version 2005/12/01 or later.
%
% This work has the LPPL maintenance status "maintained".
%
% The Current Maintainer of this work
% is the TeX4ht Project <http://tug.org/tex4ht>.
%
% If you modify this program, changing the
% version identification would be appreciated.
\immediate\write-1{version 2022-04-04-07:06}

\let\subfiles:origend\end
\def\subfiles:end{%
  \def\:temp{document}
  \ifx\@currenvir\:temp
    \let\choose:begin\@secondoftwo%
    \def\:temp##1{}
    \subfiles@restoreEndFrom\:temp
  \fi%
}
\def\:tempa#1{%
  \ifcsname subfiles@end\endcsname
  \else
    \subfiles@saveEndTo\subfiles@end
  \fi
  \pend:defI\end\subfiles:end
}

\HLet\subfiles@renewEndDocument\:tempa

\Hinput{subfiles}
\endinput

Cool, it looks like the following steps worked for me on my Arch Linux system:

  1. Update /usr/share/texmf-dist/tex/generic/tex4ht/subfiles.4ht to consist of solely the contents suggested above by @michal-h21
  2. Rerun the make4ht command as before

Thanks for the help, @michal-h21 ! Will this change be included in the next release?

michal-h21 commented 1 year ago

Yes, it is already in TeX4ht sources, so it should be in TeX Live soon.

alder711 commented 1 year ago

Cool. I will close this issue then.