jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
189 stars 61 forks source link

Add highlighting support for specific markdown code chunks using `fenced_code_attributes` syntax #143

Open cderv opened 2 years ago

cderv commented 2 years ago

Let me show example of the use case behing this:

If you were to include in a document a verbatim code chunks to show how to include a fenced code block with attributes and class, you would write something like this in a .md document

---
title: "demo"
---

Here is how you could created a code chunks with attributes  in your Markdown document using Pandoc's `fenced_code_attributes ` extensions 

````markdown
# Demo of `fenced_code_attributes`

This is a haskell code chunk using explicit class

```{.haskell}
qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)

or using the short version

qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)

Curently we would convert to HTML using

pandoc -t html -s -o test.html --highlight-style zenburn test.md

to get image

The code chunk with Markdown syntax using ```{} is not highlighted as the other one.

For Quarto (https://quarto.org/), we have made some adjustments into markdown.xml to support this syntax. So we can pass the new syntax definition file to Pandoc

pandoc -t html -s -o test.html --highlight-style zenburn --syntax-definition markdown.xml test.md

which will give us this image where bot code chunks are treated the same.

For now, we bundle this file into Quarto to be passed to Pandoc during conversion as this is important for us because we leverage the code block syntax with attributes from Pandoc, and we have a special syntax for code chunk engine that use the same ```{} syntax;

I am opening the suggestion to add such adjustements to the markdown syntax highlight file in this repo directly and not in KDE because I believe it is quite Pandoc's Markdown specific on the syntax and KDE repo would not be the place, and because I also noticed that you aare lready patching the markdown.xml file using https://github.com/jgm/skylighting/blob/master/skylighting-core/xml/markdown.xml.patch

Do you think this could be a good place to add such patch ? Are you interesting in this highlighting support ? Or that it would be better into KDE repo directly even if this is quite specific (unless it is not maybe ?) ?

Otherwise, we could keep it bundle in our tool but this would leave us with a fork of the markdown.xml and we thought it could be of interest to be accessible more generally.

Thanks for reading.

jgm commented 2 years ago

I'd be open to including a patched version of markdown.xml in skylighting. Does your patch reliably discriminate code blocks with {...} attributes from inline code that happens to start with ```{.class} and breaks over a line?

cderv commented 2 years ago

You mean Inline code like this


Inline ```doubleMe x = x + x
```{.haskell}

I am not sure I understood correctly as I never encountered that and thought it was supported.

The current patch does not account for that and the second line ```{.haskell} will be highlighted image

Not obvious to me how KDE works so that this is supported and not highlighted or highlighted differently

jgm commented 2 years ago

Inline code in markdown can start/end with any number of backticks. There is potential for ambiguity, but we resolve it to a code block if the line starts with ``` followed by attributes and then end line.

cderv commented 2 years ago

So if my example is correct, currently we get this when using

````markdown
Some content 

Inline ```doubleMe x = x + x
```{.haskell}

Other content 


![image](https://user-images.githubusercontent.com/6791940/151850443-4b273a80-42c1-471e-a00b-1d429fa0fa3c.png)

With the patch version this is the result
![image](https://user-images.githubusercontent.com/6791940/151850580-05a122a6-48c4-4d4c-8496-a63bcb4d701f.png)

Slightly better but ```` ```{.haskell} ```` is seen in both case as the start of a code block, not the end of an inline block.

I'll see how this can be improved. Not sure how KDE handles matching with relation between lines. 🤔 
cderv commented 2 years ago

So I had a quick look at how we could improve support for inline code on several line. It is not straightforward and not guaranteed with success as I am no KDE syntax file expert.

This PR is really about detecting other code block with class attributes inside markdown code block.

Can we consider a PR only for this above and maybe open an issue for the other idea if you really want this Pandoc syntax to be supported ?

jgm commented 2 years ago

Maybe it's a separate issue, if the current highlighting doesn't support this either. I'd prefer a complete solution, but I understand it might be complex. (KDE allows you to create a subsidiary context when you have the start of a code inline and exit it at the stop; that would be how to handle it.) In any case, I'd want the changes to be accepted upstream by KDE; otherwise we start to diverge and it gets to be a maintenance hassle.

cderv commented 2 years ago

In any case, I'd want the changes to be accepted upstream by KDE; otherwise we start to diverge and it gets to be a maintenance hassle.

I initially submitted here because I though this markdown syntax with fenced code class and attributes was Pandoc specific. If you think this could be submitted in upstream KDE syntax-highlighting, then I am happy to open a PR there. Is this syntax possible in markdown that KDE would be interested ?

jgm commented 2 years ago

I see. Yes, it may be pandoc-specific. PHP Markdown Extra supports something similar, but only with ~~~ fences, not ```.

cderv commented 2 years ago

That is my thoughts and why I did not PR into KDE - not sure how to justify to them this addition and especially not sure how it could impact there tools. I am only testing with Pandoc when doing such changes (mainly because I don't have the full KDE build tools and was not successful in installing such environment).

Do you want me to try anyway and post the link here so that you can comment there too ?

jgm commented 2 years ago

I think it's worth a try. After all, pandoc is a commonly used program, and some other processors may also accept this way of specifying a syntax. (I see that Maruku does, for example.) A number of others will parse this as a code block but not recognize the syntax properly, but I don't think anyone is likely to use

```{.c}
...

for any OTHER purpose.
cderv commented 2 years ago

Ok thanks. I'll try a PR there then, and report back.