gutenbergtools / ebookmaker

The Project Gutenberg tool to generate EPUBs and other ebook formats.
GNU General Public License v3.0
84 stars 18 forks source link

added html5 figure markup may strand tags #229

Closed eshellman closed 2 months ago

eshellman commented 5 months ago

While looking at #228 , noticed a validation error in the generated HTML5 https://www.gutenberg.org/cache/epub/13913/pg13913-images.html

okrick commented 5 months ago

The file I uploaded did not have a W3C Validator Error. See https://errata.pglaf.org/d/13913/work/13913-h/13913-h.htm.

Capture

This error must have been created when the boilerplate was added.

eshellman commented 5 months ago

2024-06-03 19:00:13,290 CRITICAL #13913 validation error reported by HTML_VALIDATOR: 2024-06-03 19:00:13,293 CRITICAL #13913 "file:/export/sunsite/users/gutenbackend/cache/epub/13913/pg13913-images.html":3874.1-3874.4: error: No “p” element in scope but a “p” end tag seen.

2024-06-03 19:00:19,724 CRITICAL #13913 validation error reported by EPUB_VALIDATOR: 2024-06-03 19:00:19,725 CRITICAL #13913 ERROR(RSC-005): /export/sunsite/users/gutenbackend/cache/epub/13913/pg13913.epub/OEBPS/7418548694526674177_13913-h-14.htm.html(233,72): Error while parsing file: element "div" not allowed here; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns="http://www.w3.org/2000/svg")

2024-06-03 19:00:26,360 CRITICAL #13913 ERROR(RSC-005): /export/sunsite/users/gutenbackend/cache/epub/13913/pg13913-images.epub/OEBPS/7418548694526674177_13913-h-14.htm.html(233,72): Error while parsing file: element "div" not allowed here; expected the element end-tag, text or element "a", "abbr", "acronym", "applet", "b", "bdo", "big", "br", "cite", "code", "del", "dfn", "em", "i", "iframe", "img", "ins", "kbd", "map", "noscript", "ns:svg", "object", "q", "samp", "script", "small", "span", "strong", "sub", "sup", "tt" or "var" (with xmlns:ns="http://www.w3.org/2000/svg")

2024-06-03 19:00:40,026 CRITICAL #13913 validation error reported by EPUB_VALIDATOR: 2024-06-03 19:00:40,028 CRITICAL #13913 ERROR(RSC-005): /export/sunsite/users/gutenbackend/cache/epub/13913/pg13913-images-3.epub/OEBPS/7418548694526674177_13913-h-14.htm.xhtml(234,68): Error while parsing file: element "figure" not allowed here; expected the element end-tag, text, element "a", "abbr", "area", "audio", "b", "bdi", "bdo", "br", "button", "canvas", "cite", "code", "data", "datalist", "del", "dfn", "em", "embed", "i", "iframe", "img", "input", "ins", "kbd", "label", "link", "map", "mark", "meta", "meter", "ns1:switch", "ns2:math", "ns3:svg", "object", "output", "picture", "progress", "q", "ruby", "s", "samp", "script", "select", "slot", "small", "span", "strong", "sub", "sup", "template", "textarea", "time", "u", "var", "video" or "wbr" (with xmlns:ns1="http://www.idpf.org/2007/ops" xmlns:ns2="http://www.w3.org/1998/Math/MathML" xmlns:ns3="http://www.w3.org/2000/svg") or an element from another namespace

eshellman commented 5 months ago

@okrick source is fine, that's why I added an issue.

okrick commented 5 months ago

Thanks for clarifying.

eshellman commented 2 months ago

It turns out that the problem with this source file was that it declares itself to be an xml document: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> but it does not close tags, and is thus not a valid xml document. Apparently the w3c validator now ignores the the namespace declaration and parses it with their html5 parser; ebookmaker uses a permissive xml parser that closes the tags, resulting in html5 that does not pass validation. this is a can't fix although I'll complain to the maintainers of the validator.