jgm / djot

A light markup language
https://djot.net
MIT License
1.73k stars 43 forks source link

Add "role" shorthand to attribute syntax #146

Open chrisjsewell opened 1 year ago

chrisjsewell commented 1 year ago

Similar to roles in docutils (https://docutils.sourceforge.io/docs/ref/rst/roles.html): :name:`content`, it would be nice to have an attribute shorthand, to provide "semantic meaning to content" (see also https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Roles).

This would provide a clear hook for AST preprocessors to use, as discussed in https://github.com/jgm/djot/discussions/77#discussioncomment-4269320

Currently, one could obviously just use `content`{role=name} and [content]{role=name}, but this is quite verbose if you are going to be using it often.

Similar to id=name being shortened to #name, it would be nice to use a prefix.

The one that comes to mind is =name, i.e. `content`{=name} and [content]{=name}, although this is currently used for raw-inline 😬

chrisjsewell commented 1 year ago

More generally, from an extension viewpoint, it might be nice if there was a standardised prefix for all extension types.

For example, let's say the prefix was = for now. Then this input:

[a]{=name} `a`{=name}

:::=name
a
:::

```=name
a

Went to AST like:

doc para ext_inline role="name" str text="a" str text=" " ext_inline_verbatim role="name" text="a" ext_block role="name" para str text="a" ext_block_verbatim role="name" text="a\n"

bpj commented 1 year ago

In view of {=format} already being taken by raw content as mentioned in #77[^1] and {:lang} already being suggested in #5[^2] and :emoji: already being taken what about

:::>role
Content
:::

[content]{>role}

which however might be problematic due to <...> being associated with {HT,X}ML (but most good punctuation chars are already taken![^3])

[^1]: Ultimately due to a Pandoc filter yours truly wrote which converted `code`{raw=format} into raw elements 😥

[^2]: IIRC also originally my own fault due to misunderstanding a short description of the CSS *:lang(subtag) pseudo-class! 😥😥

[^3]: IOW I'm going to regret this too! 😥😥😥

kaleb commented 1 year ago

Are classes not sufficient?

bpj commented 1 year ago

@kaleb I think the idea is that classes and roles are semantically different and/or that you may not want role markup included as classes in rendered HTML. I can sympathize, since I sometimes use Pandoc filters to remove certain attributes in HTML output, but I'm not sure that it needs to be handled any other way.

matklad commented 1 year ago

Reading https://borretti.me/article/brief-defense-of-xml made me think that we do need some explicit support for this.

I think being “extensible” is one of the greatest features of djot, and, if we can write in the manual “this is how you define custom elements: …”, that’ll help the users to grok the capability.

You can do this today with class or your own attribute, if you already know that it’s possible. Adding an explicit facility would help to unlock people’s imagination.

There’s also a problem that with class the role name goes after inline element, while it wants to be the first.

So perhaps we should steal a page from adoc here?

Custom inline element:

role:[inline content here]

Custom block element:

role:::
Block content here
:::

We can restrict this to spans and divs (as that’s where
assigning the role makes most sense), but we also can 
allow it on arbitrary fenced elements: kbd:`ctrl+F`
bpj commented 1 year ago

The thing is that it would be annoying to have to type not-role\: [not role text]. In most Markdown flavors you need to type [not link text] \(not-url) which is annoying because (a) it is common enough to happen with some frequency if square brackets have a meaning in your field and (b) it is uncommon enough that you forget to escape the parenthesis when it happens, and you might not always have syntax highlighting to alert you to it.

I do not really see why the role indicator needs to come before the text it applies to. Some Other LML does it? Well djot isn't that other MLM. I am far from convinced that it is a good idea for attributes to come after the text they apply to[^1] but it's already traditional and djot relies on it to distinguish block/inline attributes coming before/after what they apply to to distinguish them, and it's good enough, so I really don't see why something like [text]{>role} wouldn't be good enough. It is at least consistent with how djot does things already (and I think the > "arrow" is kind of iconic! 😁).

As for blocks it is already the case that in

```lang
code
``` 

lang is not quite a regular class so I do not see why

:::role
text
:::

wouldn't be good enough, and it's consistent, although

:::>role
text
:::

substituting > with whatever is used for inline roles would be consistent with inline roles.

[^1]: I find postposed attributes annoying enough that I have a snippet such that I can type something like ,a,ATTRS,TEXT,<CTRL><TAB> and have it automatically transformed into [TEXT]{ATTRS}. The main disadvantage of typing []{} is that those are shift 2 and shift 3 on my keyboard, but with the snippet I usually don't need to use any shift at all. I can substitute , with anything matching the character class [^\w\s] not occurring in ATTRS, and if ATTRS matches the regex ^[-_A-Za-z0-9]+$ an . is automatically inserted before it. I have another snippet turning ,wl,Some Text[,url][,pfx-][,-sfx],<CTRL><TAB>, (where [...] indicates optional parts) into [Some Text](url#pfx-some-text-sfx) and a similar ,tgs,Some Text[,pfx-][,-sfx],[Some Text]{#pfx-some-text-sfx}. Makes my day! 😁)

matklad commented 1 year ago

The thing is that it would be annoying to have to type not-role\: [not role text]

I envision requiring that : has no spaces around it [, and alpha:[ seems like it would be a pretty rare combination naturally.

As to why it makes sense for role to be prefix -- the difference between role and other attributes is that it affects semantic interpretation of what follows, so it helps to see it first. Eg, kbd:`Ctlr+F` to me reads better than `Ctrl+F`{>kbd}, because kbd introduces a DSL (+ is a meta char separting keys), so, if kbd comes first, by the time you get to + you already know that it'll be interpreted as a separator. That's the same deal as with $`e=mc^2`: for the parser it is more or less the same `e=mc^2`$, as it doesn't give meaning to the stuff inside backtics. For the human, "here comes the math" introducing syntax makes more sense.

emilazy commented 1 year ago

This would make me very happy and secure my wavering Djot fandom. I agree that prefix is better for this and like @matklad's proposed syntax, but if it's considered too clash-prone then perhaps there could be a prefix punctuation character on top? (I'll throw @ into the bikeshed discussion if so.)