jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
189 stars 61 forks source link

Dockerfile syntax highlighting produces warning #139

Closed aronatkins closed 2 years ago

aronatkins commented 2 years ago

Given the Markdown document:

This is a very minimal `Dockerfile`:

``` dockerfile
FROM ubuntu:bionic

RUN apt-get update

Converting this file to HTML produces a warning starting with the pandoc-2.16 release. Releases after pandoc-2.16 (including the most recent pandoc-2.17.0.1) all produce this warning. The pandoc-2.15 release converts without warning.

```bash
./pandoc-2.16/bin/pandoc index.md -o index-2.16.html
#=> [WARNING] Could not highlight code block:
#=>   Unknown syntax or context: ("Dockerfile","BashOneLine##Bash")

The HTML produced by pandoc-2.16 does not have any sourceCode annotations:

<p>This is a very minimal <code>Dockerfile</code>:</p>
<pre class="dockerfile"><code>FROM ubuntu:bionic

RUN apt-get update</code></pre>
./pandoc-2.15/bin/pandoc index.md -o index-2.15.html
# no output produced...

For comparison, here is the HTML produced by pandoc-2.15:

<p>This is a very minimal <code>Dockerfile</code>:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode dockerfile"><code class="sourceCode dockerfile"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> ubuntu:bionic</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">RUN</span> apt-get update</span></code></pre></div>
aronatkins commented 2 years ago

Using an older syntax definition with up-to-date Pandoc (2.17.0.1) successfully parses and adds sourceCode annotations.

Using https://github.com/KDE/syntax-highlighting/blob/10c9941d364cae8fdbcca123d090308aca8ecd33/data/syntax/dockerfile.xml

./pandoc-2.17.0.1/bin/pandoc --syntax-definition dockerfile.xml index.md  -o index.html

The resulting HTML is similar to what we get with pandoc-2.15:

<p>This is a very minimal <code>Dockerfile</code>:</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode dockerfile"><code class="sourceCode dockerfile"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> ubuntu:bionic</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">RUN</span> apt-get update</span></code></pre></div>

This change appears to correspond with the update to skylighting-0.12.1 in Pandoc-2.16, which contained a Dockerfile syntax update: https://github.com/jgm/skylighting/commit/22fe4fd34b8d8794046dc448dc77c192dafc3392

jgm commented 2 years ago

Transferring to skylighting.

jgm commented 2 years ago

Note: this only affects two tokenizers: markdown and dockerfile.