syntax-tree / mdast-util-directive

mdast extension to parse and serialize generic directives (`:cite[smith04]`)
https://unifiedjs.com
MIT License
12 stars 6 forks source link

Allow the library user to specify a list of directive names, that they indend to parse #10

Open csicar opened 2 months ago

csicar commented 2 months ago

Initial checklist

Problem

When the markdown contains a :, but that is not part of a directive, it is still parsed as one. E.g. in german, there is a style of gendering that uses : as part of a word

Liebe Mitarbeiter:innen

This gets parsed as a directive :innen[]

Solution

Allow the library user to specify what directive names should be accepted as part of the parsing and which should be skipped.

Alternatives

I see two alternatives:

  1. Fix this in the rendering stage: I.e. make sure that when rendering a directive, we convert the AST back into the markdown representation. I think this is undesirable, since the exact (and textually correct) representation may be lost. E.g. in this example, it would be hard to figure out if the directive should be rendered as :innen or :innen[] just from the AST
  2. Do not parse \w:\w as a directive. I think this is undesirable because it changes the parsing in a non-backward-compatible way and only fixes this specific problem
wooorm commented 2 months ago

Hey hey!

The whole point of directives is that they are all supported even when unknown. So that you can write markdown with directives, and then GitHub can show that markdown, even if it doesn’t understand what my-custom-video means. Or other tools that don’t understand it, like markdown formatter, can still process it.

As an author, you can choose to turn this character into a plain colon: Liebe Mitarbeiter\:innen. Or, you can use the asterisk as the gender star, I personally see it more often: https://en.wikipedia.org/wiki/Gender_star. Asterisks are also used in markdown, so you might have to escape it too.

I do think the 1st alternative you mention is quite viable for several systems. Such as if you’d make GH comments, where authors are only expected to write known directives, and not expected to write unknown directives.

csicar commented 2 months ago

Thank you for the quick response! I'd still argue, that the 1st alternative has the problem of the canonical representation: I.e.: how could github find out how to represent the unknown directive as :innen[] vs :innen

The whole point of directives is that they are all supported even when unknown.

I agree, especially for the formatting use case. For other use-cases on the other hand, like rendering markdown, I'd want the parser to behave differently, which could be done by adding an option like allowDirectives: string[] to the directives parser

wooorm commented 2 months ago

If a tool supports unknown directives when rendering, like GH in this example, it would display the component. Just like how GH shows frontmatter data as a table. It doesn't understand what the keys and values mean but it can still display it. So it doesn't differentiate between with and without []. It always shows that component.