realworldocaml / mdx

Execute code blocks inside your documentation
ISC License
269 stars 45 forks source link

markdown: parse fenced_code_attributes extension #445

Open edwintorok opened 10 months ago

edwintorok commented 10 months ago

Pandoc supports this extension: https://pandoc.org/MANUAL.html#extension-fenced_code_attributes

And this:

Recognize them in the lexer. Try to limit the complexity of the regular expression by splitting off parsing of attributes into a separate 'parse' (otherwise we hit automata size limits in ocamllex).

According to https://quarto.org/docs/authoring/markdown-basics.html#ordering-of-attributes the ordering has to be:

For now on output we always normalize to this form (which isn't ideal, but could be improved later):

I initially tried to fully parse the attributes, but I've exceeded the maximum size of the ocamllex automaton, so I kept it simple in this PR (and do some minimal parsing in OCaml later, note that key-value pairs aren't split correctly, but when joined backed together they retain the original value).

edwintorok commented 10 months ago

(this PR might need some wider testing to check it doesn't break backwards compatibility, is there a larger corpus you'd normally test changes like this on? e.g. the realworldocaml book, anything else?)