jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.78k stars 3.39k forks source link

Boolean HTML attributes not polyglot when using --self-contained #5780

Open marrus-sh opened 5 years ago

marrus-sh commented 5 years ago

Pandoc version: 2.7.3

I am compiling Markdown with pandoc to standalone HTML, using a custom template of my own with the --template flag. The full call is something along the lines of:

pandoc -f markdown-smart -t html5-smart --standalone \
    --template "template.xhtml" --filter pandoc-citeproc \
    --self-contained --section-divs --mathml

In the template I have code like the following:

<div hidden="">

However, the HTML which results is instead:

<div hidden>

(The hidden attribute has lost its ="".)

This is equivalent HTML, but it is not valid polyglot markup (it is invalid XML). One can get around this by instead writing:

<div hidden="hidden">

(Which is defined by HTML5 to be equivalent.) But it would be nice if pandoc supported polyglot empty attributes as well.

marrus-sh commented 5 years ago

Doing some additional testing, it seems like --self-contained specifically is the culprit here; correct polyglot markup is produced when I remove that option.

(Compare echo []{hidden=""} | pandoc -f markdown -t html --standalone and echo []{hidden=""} | pandoc -f markdown -t html --self-contained as a MWE.)

mb21 commented 5 years ago

We're using the tagsoup library's data model to implement the --self-contained option. Essentially we re-parse the string generated by the HTML writer using tagsoup, replace all the external things with data-urls, then render with tagsoup to HTML again. See SelfContained.hs.

Seems like they have an old open issue about this: https://github.com/ndmitchell/tagsoup/issues/23 and there's currently no workaround?

jgm commented 5 years ago

Does <div hidden=""> have the same meaning in HTML as <div hidden="hidden">? If so, we could replace the first with the second automatically before calling tagsoup.

mb21 commented 5 years ago

Does <div hidden=""> have the same meaning in HTML as <div hidden="hidden">?

yes, https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes#Boolean_Attributes

but we would have to hardcode the list of boolean attributes, which is what the tagsoup issue was supposed to do, it would be ideal to fix it there I suppose.