michal-h21 / tex4ebook

Converter from LaTeX to ebook formats (epub, mobi). Using tex4ht and texlua scripts.
314 stars 33 forks source link

DT element generated containing P element (fails validation) #109

Open fsparv opened 1 year ago

fsparv commented 1 year ago
$ tex4ebook -v
tex4ebook v0.3g

Using This MWE produces an epub that has illegal html:

\documentclass[paper=6in:9in,pagesize=pdftex,headinclude=on,footinclude=on,12pt]{scrbook}
\begin{document}
\begin{description}
    \item[foo]  bar
    \item[baz]  bam
\end{description}
\end{document}

tex4ebook -d ../epub-out -f epub -t ../mwe1.tex

$ java -jar ../../tools/epubcheck-5.0.1/epubcheck.jar ../epub-out/mwe1.epub 
Validating using EPUB version 2.0.1 rules.
ERROR(RSC-005): ../epub-out/mwe1.epub/OEBPS/mwe1.html(12,40): Error while parsing file: element "p" not allowed here; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns="http://www.w3.org/2000/svg")
ERROR(RSC-005): ../epub-out/mwe1.epub/OEBPS/mwe1.html(16,40): Error while parsing file: element "p" not allowed here; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns="http://www.w3.org/2000/svg")

Check finished with errors
Messages: 0 fatals / 2 errors / 0 warnings / 0 infos

generated html looks like this:

<?xml version='1.0' encoding='utf-8' ?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns='http://www.w3.org/1999/xhtml'> 
<head><title></title> 
<meta content='text/html; charset=utf-8' http-equiv='Content-Type' /> 
<meta content='TeX4ht (https://tug.org/tex4ht/)' name='generator' /> 
<meta content='TeX4ht (https://tug.org/tex4ht/)' name='originator' /> 
<!--  xhtml,charset=utf-8,epub,uni-html4,html  --> 
<meta content='mwe1.tex' name='src' /> 
<link href='mwe1.css' rel='stylesheet' type='text/css' /> 
</head><body>
      <dl class='description'><dt class='description'>
      <!-- l. 5 --><p class='noindent'>
<span class='cmssbx-10x-x-120'>foo</span> </p></dt><dd class='description'>
      <!-- l. 5 --><p class='noindent'>bar
      </p></dd><dt class='description'>
      <!-- l. 6 --><p class='noindent'>
<span class='cmssbx-10x-x-120'>baz</span> </p></dt><dd class='description'>
      <!-- l. 6 --><p class='noindent'>bam</p></dd></dl>

</body></html>

note particularly:

<dt class='description'>
      <!-- l. 5 --><p class='noindent'>
<span class='cmssbx-10x-x-120'>foo</span> </p></dt>

Note that the html spec for dt is different from dd and li as it only allows (PCDATA \| Inline)* not (PCDATA | Flow)* https://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_listmodule

michal-h21 commented 1 year ago

Thanks for the report. I've fixed that in TeX4ht sources, so it should work finely soon. In the meantime, you can use this configuration file:

\Preamble{xhtml}

\catcode`\:=11
\ConfigureList{description}%
   {\EndP\HCode{<dl \a:LRdir class="description">}%
      \PushMacro\end:itm
\global\let\end:itm=\empty}
   {\PopMacro\end:itm \global\let\end:itm \end:itm
\EndP\HCode{</dd></dl>}\ShowPar}
   {\end:itm \global\def\end:itm{\EndP\Tg</dd>}\HCode{<dt
        class="description">}\bgroup %\par\ShowPar
%\bfseries
}
   {\egroup\EndP\HCode{</dt><dd\Hnewline class="description">}\par\ShowPar
}
\catcode`\:=12

\begin{document}
\EndPreamble