jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
194 stars 62 forks source link

Markdown: first list item marker in fenced code block parsed differently #84

Closed matoushs closed 5 years ago

matoushs commented 5 years ago

The first list item marker inside a fenced code block is processed differently than the others. A list within a code block with a markdown class places the span after the first list marker (1.), but with other classes the list marker (1.) is placed as content in the tag.

I compiled the following code with: pandoc -f markdown -t html

# Testing Fenced Code Blocks

```markdown
Ordered List - Markdown
1.  bread
2.  milk
3.  sugar
```

```javascript
Ordered List - Javscript
1.  bread
2.  milk
3.  sugar
```

```markdown
Enumerated List - Markdown
-  Take out trash
-  Vaccuum
    - Bedrooms
-  Wash dishes
```

This produces:

image

Ordered List - Markdown

<pre class="sourceCode markdown"><code class="sourceCode markdown"><a class="sourceLine" id="cb1-1" title="1">Test Paragraph</a>
<a class="sourceLine" id="cb1-2" title="2">1. <span class="fl"> bread</span></a>

Ordered List - Javscript

<pre class="sourceCode javascript"><code class="sourceCode javascript"><a class="sourceLine" id="cb2-1" title="1">Test Paragraph</a>
<a class="sourceLine" id="cb2-2" title="2"><span class="fl">1.</span>  bread</a>

Enumerated List - Markdown

<pre class="sourceCode markdown"><code class="sourceCode markdown"><a class="sourceLine" id="cb3-1" title="1">Test Paragraph</a>
<a class="sourceLine" id="cb3-2" title="2">- <span class="fl"> Take out trash</span></a>

See that in the first and third examples the list marker "1." appears before the span, but in the second example it appears within the span. This produces correct syntax highlighting.

My setup is: macOS Mojave 10.14.5 pandoc 2.7.1

mb21 commented 5 years ago

I guess this issue should be moved to https://github.com/jgm/skylighting/

jgm commented 5 years ago

I just tried this in Kate and it appears to work there, so this might be an issue in skylighting rather than the markdown.xml syntax definition.

jgm commented 5 years ago

Relevant bits of trace output:

...
FALLTHROUGH Just (NormalTok,"Hi")
...
Trying rule Rule {rMatcher = RegExpr (RE {reString = "[\\d]+\\.\\s", reCaseSensitive = True}), rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Just 0, rContextSwitch = [Push ("Markdown","numlist")]}
RegExpr MATCHED Just (NormalTok,"1. ")
CONTEXT STACK ["numlist","Normal Text"]
FALLTHROUGH Just (StringTok," ")
etc.

As far as I can see, we're interpreting the syntax definition correctly. First we match on

      <context attribute="Normal Text" lineEndContext="#stay" name="Normal Text">
        <RegExpr context="numlist" String="^[\d]+\.\s"/>

Since the RegExpr element has no attribute, the parent attribute Normal Text is used for what matches here (1.). Then we switch to the "numlist" context and from then on everything has the String attribute (which this definition uses for numbered lists).

But since Kate works, maybe there's a problem in my understanding of how this is supposed to be interpreted.

jgm commented 5 years ago

It's an easy matter to fix markdown.xml, but I'm still puzzled as to why Kate works.