vim-pandoc / vim-pandoc-syntax

pandoc markdown syntax, to be installed alongside vim-pandoc
MIT License
426 stars 61 forks source link

CommonMark compatable syntax overhaul #327

Open alerque opened 4 years ago

alerque commented 4 years ago

Baring a a way to use Pandoc's own internal AST for highlighting (see see #300), I've become fairly convinced that the best way forward is to gut and rebuild the syntax rules from the ground up. Given Pandoc's direction moving towards 100% CommonMark spec compatibility I suggest using this chance to target those syntax definitions, then have tweak settings that modify it for legacy Pandoc syntax.

Of course in the move to CommonMark we might also get the possibility to use the AST, but a robust set of syntax rules for CommonMark that did not depend on being able to run Pandoc in the background would be useful as well, even if that option goes forward as an even more robust alternative.

I don't think there are any other efforts at targeting CommonMark specifically yet, but I think there are better Markdown efforts in general. It might be a good time to review how they are doing it.

fmoralesc commented 4 years ago

:+1:

Definitely a good idea. I started a commonmark syntax ages ago but I was blocked by the difficulties in detecting blocks; maybe we should try again, as you propose.

Some of the functionality in vim-pandoc depends on the syntax plugin, and I have been wondering about how to replace it with something like an LSP server that analyzes the documents (my more immediate goal isn't to parse the full pandoc spec, but only to get information about markdown blocks).

More to the point, I think we should design the syntax so it exposes layers of syntax support. There are large swathes of things that pandoc supports that are of no regular use whatsoever, and that should only be enabled on demand (table support is a big one that can really bring the performance of the plugin for even regular documents without tables).

alerque commented 4 years ago

Partly out of annoyance at switching back and forth between plugins myself and partly out of frustration at the ongoing bug reports that I have no desire to fix on an individual basis knowing the core is such a mess, I've actually started hacking on this. I'll post it for anybody else to observe & contribute to along the way.

If you have existing code from an attempt at this it would be interesting to see. Did you put it in a branch or something?

Of course parsing the overall block type needs to be first, then gradually adding inline markup and allowing them inside the blocks or other inlines they are allowed to contain.

fmoralesc commented 4 years ago

Great work! Sorry I haven't been of much use about this whole issue, but my priorities have been elsewhere as you know. As you say, the problems are very deep, so it's also pretty daunting to tackle the issue, and I was hesitant to start the syntax from scratch (believe it or not... it would be the third/fourth time!)

If you have existing code from an attempt at this it would be interesting to see. Did you put it in a branch or something?

This was years ago by now. I might have a copy around somewhere, but I moved computers so I'll have to search it.

alerque commented 4 years ago

I was hesitant to start the syntax from scratch (believe it or not... it would be the third/fourth time!)

Oh I'd believe it. And know that my instinct to gut and replace it isn't a lack of appreciation or a reflection on the quality of previous efforts. This is a hard issue and a moving target. Even actual Markdown parses have a hard keeping up and the way syntax highlighting works these attempts will always only be a poor-man's hack compared to a real parser.

By the way do you know of plugins that leverage Lua LPEG to supplement syntax rules?

fmoralesc commented 4 years ago

No, I am not aware of anything like that. Having an LPEG grammar for pandoc markdown or commonmark would be useful in any case, though.

alerque commented 4 years ago

The primary author of Pandoc and one of the driving forces behind CommonMark itself seems to have no fewer than three Lua related efforts up his sleeve: cmark-lua, commonmark-lua, and luacmark.

None are a peg grammar though, for which I could see usefulness outside of this plugin! I might start a project along those lines anyway.

Edit: Four: lcmark.

Edit: Six. Sort of, the earlier two are actually PEG based, but not targeted at CommonMark: peg-markdown and lunamark.

alerque commented 4 years ago

I'm not at all sure yet another implementation is needed, but largely as an experiment in how such a parser might be leveraged to aide syntax highlighting (particularly in NeoVim) I've stared an LPEG based CommonMark parser of my own. I'm also interested in replacing the Markdown parser bundled in SILE (which is currently a fork of lunamark with something CommonMark compliant.

If anything comes of it the experiment it may may well turn out that the cmark-lua library is actually the thing to use and just wrap it in the needed interface.

alerque commented 4 years ago

Here is a plugin using LPEG for syntax highlighting: mparse. Interestingly that project points out there is a pure Lua LPEG implementation out there. I don't think it's necessary since NeoVim includes LPEG support out of the box (confirm that is all distros, not just some) but it's good no know it exists.