Closed wooorm closed 3 years ago
a) This option doesn't seem sustainable. The remark maintainers would become gatekeepers for new syntax types. Not all syntax extra features have the same level of completeness and maturity, and representing this would become challenging. Any features that don't get merged, the authors only option would be to fork the entire parser.
b) It may not be an easy approach, but it seems like the most maintainable one. Finite State Machines (FSMs) are made up of states and transitions. As long as the hooks allow adding and removing, both states and transitions, new syntax layers should be doable as plugins.
A couple considerations this would raise.
c) Maybe, as discussed in https://github.com/micromark/micromark/issues/9 backtracking can easily lead to performance issues, which seems to go against the stated goal of Micromark.
a) This option doesn't seem sustainable. The remark maintainers would become gatekeepers for new syntax types.
I would like to add that there are a couple of important syntax extensions: frontmatter, GFM, and MDX. Syntax extensions lead to different implementations, which leads to Markdown being less portable to other vendors, which is annoying! Markdown already has HTML as a place for extensions, and as unified we are also pushing MDX. Maybe standardising a couple of optional features and not allowing everything is a good thing for Markdown?
Having some default extensions hidden behind flags could be fine. As long as there is a way to add syntax, that is not bundled with micromark core.
Could you expand on why do you think that is important?
Other languages aren’t like this. In JS, CSS, or HTML it isn’t normal to do non-standard stuff (there are languages on top of them though, but those are implemented in new parsers)
Could you expand on why do you think that is important?
Many remark plugins, including
Hook into the tokenizer to provide new syntax. I want these projects to be able to safely upgrade to the new micromark based remark.
Other languages aren’t like this. In JS, CSS, or HTML it isn’t normal to do non-standard stuff (there are languages on top of them though, but those are implemented in new parsers)
Depends on what you mean, there are tools for these languages that offer pluggable parsers, for example:
Other languages:
Babel has syntax plugins but those pass an option the parser (essentially flow: true
).
PostCSS has different parsers, that are different projects that transform to the PostCSS AST.
For the format: Extensions make the format not portable; I think this hold the markdown format back; I think that we are in a position to move Markdown forward.
For the current extensions API: it isn’t very nice, it feels hackish, the code for plugins looks a bit spaghetti/buggy too.
One interesting idea is Generic directives/plugins. TLDR:
:name[content]{key=val}
such as :cit[smith04]
(inline)::name[content]{key=val}
such as ::toc[Table des matières]
(leaf block):::name[inline-content]{key=val}
contents, which are sometimes further block elements
:::
Note that some things such as remark-breaks can be done on a CST.
Say we’d support frontmatter, GFM, MDX, and these generic extensions, are other things really needed?
So frontmatter
would be specced/provided by us, shortcodes
and attrs
could be a generic extension, and last zmarkdown
could be a fork (like how gfm is a fork of cmark)
That could work, thanks for outlining your idea so clearly @wooorm! :bowing_man: @vhf and @djm this would most directly impact your projects, thoughts? :thought_balloon:
I see Markdown as something much easier to learn, to write, and less powerful than HTML. Custom syntax elements have the same benefit as opposed to mixing HTML into Markdown.
I like the Babel approach, and until now the Remark approach of writing plugins that can hook into any part of the parsing/compiling process, my preference would be to keep it that way. If the consensus is to go another direction I'll adapt though, I'm not the one doing the hard work on micromark. :)
Forking micromark would be an option for my projects, one of the cost of which (and you could see it as a benefit if custom syntax is holding Markdown back) is that we won't be able to create a new tool by cherry-picking a few libs and plugins and composing them together.
IIRC we have Gatsby as a sponsor and a few Gatsby contributors are also unified/remark/rehype contributors, unfortunately I don't know who to ping. I think their perspective on this would also be of interest, their project would be impacted (example) as well.
Nobody said this yet, but I think it’s noteworthy to mention that I don’t see any way where current plugins that integrate with remark-parse, could work as-is with micromark. Even if micromark has extensions, they’d need to be rewritten entirely. This does not affect transformer plugins (mdast remains the same I think)
Thanks Victor!
I like the Babel approach
Do you have an example of how Babel allows custom syntax? What I found is that Babel has syntax plugins but those pass an option to the parser (essentially flow: true).
Oh and a question: could the whole zestedesavoir content be converted from its older custom syntax, to a new standard? We could have “codemods” that take remark-extension-markdown and port it to micromark-generic-directive-markdown?
I’ll post the Gatsby comment below so it’s a separate link
Going through all gatsby-remark plugins in the Gatsby monorepo gives us gatsby-remark-katex
and gatsby-remark-custom-blocks
that integrate with remark-parse.
gatsby-remark-custom-blocks
could be (should be?) changed to the generic directives syntax:
:::name[inline-content]{key=val}
contents, which are sometimes further block elements
:::
Math 🤔🤷♂️ It’s pretty common to support $foo$
. Generic directives syntax would give :math[foo]
. Could work as well?
Gatsby folks, what do you think about the remark ecosystem dropping support for any extensions? And instead, supporting only a couple (frontmatter, GFM, MDX, and generic directives)?
/cc @johno @ChristopherBiscardi @sidharthachatterjee
Do you have an example of how Babel allows custom syntax? What I found is that Babel has syntax plugins but those pass an option to the parser (essentially flow: true).
I don't sorry, disregard this comment as it's based on what I remember from contributing to Babel, I could be wrong about that and it was Babel 6.x, 4 years ago 😱
Oh and a question: could the whole zestedesavoir content be converted from its older custom syntax, to a new standard?
Unfortunately not, for two main reasons:
I'd say we would either fork to reimplement the syntax, or stay on our current stack and possibly maintain whatever dependency becomes deprecated upstream if need be. What we currently have is pretty stable and I still see it as a viable option. :)
These are now supported in micromark. I’ll add more on how they work in cmsm later.
This state machine is finite. Markdown, mostly annoyingly, but in some cases hugely useful (GFM, MDX) has extensions.
We can either a) define most useful extensions and hide them behind flags, b) support hooks for extensions to overwrite states and the like, c) figure out a way to allow backtracking and attempting a list of possibilities, or d) something else?
They all have downsides.