jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.87k stars 3.34k forks source link

Nesting directive not indenting code block text #8515

Open castedo opened 1 year ago

castedo commented 1 year ago

I found this bug using pandoc but I'm guessing it's a bug in this Haskell module. I learned about this nesting template feature from the README.md of this repo.

STEPS: With input.md.txt and yaml.template.txt do:

pandoc input.md.txt --from=markdown --to=html --template yaml.template.txt

RESULT:

---
title: |+4
    Doc with pre-formatted text
body: |+4
    <p>This is normal text.</p>
    <pre><code>Now here&#39;s some
pre-formatted text
across many lines.</code></pre>
    <p>And finally normal text again.</p>
...

EXPECTED: All five lines of body text to be indented.

VERSION:

pandoc 2.19.2
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.2, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4

This approach could be used to output a valid YAML file with the output content of pandoc variables, some HTML, in a structured and diff-friendly format.

jgm commented 1 year ago

This actually comes from something in pandoc: the use of the flush combinator for pre tags in T.P.Writers.Blaze (l. 38). flush tells doclayout to ignore the indentation.

This is intentional, because in most cases, indenting a pre tag will cause unwanted spaces in the output.

This YAML context is an exception, but I don't think it would be good to modify the code, because generally we do want pre elements to be flush....

castedo commented 1 year ago

I'm not blocked by this. I have an alternative approach for my usage scenario. So feel free to close.

My usage scenario is having pandoc output HTML but in a format that stores metadata, title HTML, abstract HTML and body HTML as separate structured data variables. I then have separate code (in Python) doing further processing and eventually injecting these and derived variables into jinja2 templates that are part of building a larger static website.

The way I am currently handling this is having pandoc output "HTML" but into a template that is actually the following XML:

<obj>
$if(abstract)$
  <str key="abstract"><![CDATA[$abstract$]]></str>
$endif$
  <str key="body"><![CDATA[$body$]]></str>
  <str key="title"><![CDATA[$title$]]></str>
</obj>

which is in "JSOML" format: https://gitlab.com/castedo/jsoml

With JSOML I can process the ouput file with the same semantics as JSON and YAML, but the format is diff-friendly like YAML and easier than JSON when looking at strings of large HTML text.