Open gabestein opened 3 years ago
Looked into this a little bit today. So consider this bit of HTML, interpreted as an .md
file:
<blockquote><p><em>Hello</em></p></blockquote>
If you convert this to the Pandoc AST using pandoc -t json -f markdown input.md
you get:
{"pandoc-api-version":[1,22],"meta":{},"blocks":[{"t":"RawBlock","c":["html","<blockquote>"]},{"t":"RawBlock","c":["html","<p>"]},{"t":"Plain","c":[{"t":"RawInline","c":["html","<em>"]},{"t":"Str","c":"Hello"},{"t":"RawInline","c":["html","</em>"]}]},{"t":"RawBlock","c":["html","</p>"]},{"t":"RawBlock","c":["html","</blockquote>"]}]}
Each opening (and closing!) HTML tag is interpreted as its own RawBlock
/RawInline
element. This is apparently expected behavior per the Pandoc manual:
...pandoc can process “bare” raw HTML and TeX, [but] the result is often interspersed raw elements and normal textual elements...
I don't know why anyone would want this! However, if you invoke Pandoc with -f markdown_strict
instead, you get:
{"blocks":[{"t":"RawBlock","c":["html","<blockquote><p><em>Hello</em></p></blockquote>"]}],"pandoc-api-version":[1,20],"meta":{}}
which is what our importer is designed to handle — when it sees a RawBlock
with type html
it passes the contents wholesale into a Pandoc subprocess to be parsed and transformed. We originally imported Markdown as markdown_strict
and at some point switched to markdown
to gain flexibility elsewhere, and this is an unintended side effect of that change.
So the remedy is one of:
markdown
format gives us..md
file as either markdown
or markdown_strict
.We need to do (1) anyway, but that's a much larger project that could consume a cycle or more. I think (2) is not really worth our time, but (3) would be easy and potentially more broadly useful.
Discussed 5/11. Likely implementing no. 3 (or no. 3 + extensions) at some point, but not urgent. For now, the workaround is to replace block-level HTML in blockquotes with MD equivalent.
What went wrong, step-by-step?
Error applying transaction: RangeError: Invalid content for node type blockquote
What did you expect to happen?
It should import correctly.
2020-07-22-student-employment-academic-libraries.md