jgm / djot

A light markup language
https://djot.net
MIT License
1.63k stars 43 forks source link

Explicit callout/admonition types #196

Open toastal opened 1 year ago

toastal commented 1 year ago
{.warning}
:::
Watch out!
:::
::: warning
Watch out!
:::

Both of these are listed in the docs. This flexibility is great (especially when compared to folks abusing > blockquote syntax), however, I have some concerns from a higher level where the user doesn't control the renderer and multiple platforms are implementing Djot. If these classes are all ad-hoc and created by each user without required consistency, how will embedded situations be able to make any guarantees about that this block is a warning callout/admonition vs. a regular <div class="warning"> (or whatever non-HTML output it is) so that the renderer can react + style these blocks appropriately? If greater adoption takes place for Djot, it would be good to know that a README.dj could be rendered appropriately on a user's machine, on GitLab/Gitea/SourceHut/etc. Or say I added Djot as a way to compose long-form posts on my hot new ActivityPub social media platform, but my personal blog built with Djot supports only my version of admonitions.

Markdown not having these explicit means you'd have to follow someone else (which locks you into a single platform) or simply have it be unsupported. Admonitions are a common enough pattern that I see NOTE: in like 70% of docs and posts I see and are built into AsciiDoc, reStructuredText, and many Markdown forks (though I also see **NOTE:**, Note:, **Note:**, > **Note**, ℹ️, ℹ️ Note, | Note | |, and I think you see the point). A defined set assures the user an expected output and creates a finite set that the renderer can style appropriately.

For research AsciiDoc supports: NOTE, TIP, IMPORTANT, CAUTION, WARNING

jgm commented 1 year ago

One thing to keep in mind is that we don't want to be English-centric. That means:

toastal commented 1 year ago

Those are definitely good points brought up a lot when these come up. Asciidoctor’s tooling translates these when you render, but it can be jarring to see English in a non-English document.

bpj commented 1 year ago

Yes, English words as markup in a document in another language is jarring, which is the main reason I don’t like TeX. Also remember that a word which has an “obvious” meaning in English may be a word in another language with a more or less unrelated meaning, which makes perfect sense for something else in that language. It happens with some frequency between European languages, especially with words of Greek/Latin/French origin.

That said it cannot always be avoided, as there is a limit to what meaning you can wring out of punctuation. However things like admonitions are properly just special-case divs, so handling them specially should be left to filters, renderers and postprocessors and not be hardcoded into the syntax of a general purpose lightweight markup language.

(I wrote some more things here which properly should go in their own discussion #197.)

vassudanagunta commented 1 year ago

Unfortunately, for the Markdown world at least since they can effectively set de facto standards unilaterally, GitHub has chosen the English language approach:

> **Note**
> This is a note

> **Warning**
> This is a warning

djot could support explicit admonition types by using Unicode. It could, for example, use its existing fenced block syntax, with special treatment if the opening fence is followed by one of a set of admonition symbols. There are a set of Unicode symbols with predefined official meanings that make sense as admonition markers.

::: ℹ️ তথ্য
এটি বাংলায় একটি তথ্য নোট
:::

::: ⚠️ 小心
需要令牌“Warning”在其他語言中看起來很糟糕。
::: 

:::⚠️ Do not feed the dragons!
Unless you:

- are properly supervised by children.
- are wearing dragon pajamas
- feed them pancakes
:::

The latter would be equivalent to:

::: warning
## Do not feed the dragons!
Unless you:

- are properly supervised by children.
- are wearing dragon pajamas
- feed them pancakes
:::

Even for English this is cleaner and more natural. The plain text form looks close to the graphically rendered form. Admonitions often have a title ("⚠️ Warning" is a bit redundant), and one shouldn't have to use the heading hack and be forced to choose a heading level. Like a figure caption, block titles should have syntax support.

This also allows for an even simpler and even cleaner syntax for single paragraph admonitions, the most common case:

ℹ️ This is an info message.

⚠️ Consider this a warning.

✅ This is an affirmation message.

🚫 Do not feed the dragons.
matklad commented 1 year ago

One approach would be to allow localizing the keywords, so that ::: important is exactly the same as ::: важно.

jgm commented 1 year ago

If we wanted to go the unicode symbol way, we could think of admonitions as a special case of lists with the marker ℹ️ or ⚠️. That would remove the need for the :::.

⚠️ Do not feed the dragons!

   Unless you:

   - are properly supervised by children.
   - are wearing dragon pajamas
   - feed them pancakes
jgm commented 1 year ago

Localizing is another option, but this adds a great deal of complexity. (You have to maintain big lists of localizations for umpteen languages.)

A lower-complexity option would be keeping the English words as defaults, but allowing alternatives to be defined e.g. through a configuration file.

matklad commented 1 year ago

A lower-complexity option would be keeping the English words as defaults, but allowing alternatives to be defined e.g. through a configuration file.

matklad commented 1 year ago

Yeah, that’s what I have in mind: English as default, but explicitly mention that inplementations are allowed (and encouraged?) to provide synonyms for whatever languages they find useful.

vassudanagunta commented 1 year ago

If we wanted to go the unicode symbol way, we could think of admonitions as a special case of lists with the marker ℹ️ or ⚠️.

That would be beautiful. It subsumes the single paragraph admonitions I suggested.

I should have thought of that as I'm developing a meta plain text language, Plain Text Style Sheets, that supports arbitrary use of any of the four well-established and visually natural ways to support uniformly composable block containers:

Since admonitions of various types have established symbols understood across human languages, the list item-style structure works perfect.

If sticking with ASCII was a requirement:

(!) Do not feed the dragons!

    Unless you:

    - are properly supervised by children.
    - are wearing dragon pajamas
    - feed them pancakes
toastal commented 1 year ago

These are all good proposals actually. If it were up to me, I like the idea of supporting both an Unicode symbol and an ANSI/ASCII-compatible option where feasible as there will be situations where the author would prefer one or the other. The issue I see with the hanging indent option is that you're having to line up after the symbol which have different lengths (⚠️ is 1 char, /!\ is 3 char) vs. just being a indentation/tab.

...and @vassudanagunta the GitHub admonition syntax gaff was exactly what I was referring to 😅

bpj commented 1 year ago

Configurability is the way to go, always. I would much rather use ::: Varning than :::Warning[^1] or even ⚠️ in a Swedish language document, but fundamentally I still think that ::: ANYTHING should be allowed but left up to renderers/filters what to make of them except for a few ::: + punctuation [+keyword] which might be used e.g. for metadata, parser/renderer/filter configuration (see #197 and links therein.)

Ideally you might for example pass something like this to e.g. a LaTeX renderer (using YAML for illustration purposes):

roles:
  div:
    Varning:
      begin: '\begin{admon}[text=Varning]{warning}'
      end: '\end{admon}'

(where admon is of course an environment for admonitions (a terrible hack))

and this to an HTML renderer:

roles:
  div:
    Varning:
      attrs:
        '+classes':
          - admonition
          - warning
      before-body: '<div class="admonition-type">&#x26a0; Varning</div>'

You get the idea!

[^1]: The difference is minimal but the one is misspelled in English, the other in Swedish!

toastal commented 1 year ago

@bpj How does configurability help with situations where you can't touch the configuration (as noted: a README.dj on a code forge, a social media post, etc.)?

The reason I opened this issue is to gauge if there is interest in solidifying a pattern for admonitions so writers can expect certain outputs (e.g. my admonition renders fine without tweaks on Codeberg and GitLab and SourceHut because the CSS knows to handle admonition syntax because it's standardized and not an ad-hoc convention that each platform handles differently).

vassudanagunta commented 1 year ago

The issue I see with the hanging indent option is that you're having to line up after the symbol which have different lengths (⚠️ is 1 char, /!\ is 3 char) vs. just being a indentation/tab.

@toastal, in djot list item indents don't have to math the marker length; You're thinking of CommonMark. A single space indent is sufficient.

I still think that ::: ANYTHING should be allowed but left up to renderers/filters what to make of them

@bpj, you already have that as djot Div blocks treat ANYTHING as a class value, which renderers/filterers can do with as they please. This Issue is not about Div blocs, about "explicit admonition types", i.e. "first class admonitions" that would be represented in the AST as such.

bpj commented 1 year ago

@vassudanagunta read again my comments and the discussion linked from it.

I am opposed to admonitions and similar presentational stuff as first class objects and instead propose a mechanism were you can use ::: SOMETHING as a hint to renderers/filters (where handling stuff as admonitions IMO belong) to treat the div specially (as and if they want to, e.g. as admonitions) without SOMETHING ending up as a class in HTML attributes, unless a renderer/filter puts it there, i.e. a "hint" should be a different thing than an attribute and stored differently in the AST.

toastal commented 1 year ago

@bpj then should > be removed because you could just make a block with ::: blockquote and hint that this <div> should maybe be rendered as a <blockquote>? Personally, I see this 'Note:' pattern so often that it feels exceptional and in the same category as blockquotes. It's been my experience that almost all documentation systems eventually have to cross admonitions as a hurdle, and when they do, they necessarily have to either flavor the syntax, hard fork, or rely purely on a stringly-typed convention like you noted with ::: SOMETHING.

bpj commented 1 year ago

@toastal I'm not in the least attached to blockquotes. Their special treatment is an historical holdover, not an example to emulate. Note that SOMETHING needn't be a word. I'd be perfectly cool with :::❞ for blockquotes! 😁

As much as I work with documentation for a living my point is that djot is a general purpose markup language, and I hope it remains so. Adding special purpose syntax is a slippery slope: whose special purpose should be accepted or rejected? It is not feasible to support everyone's. It is better to add general purpose extensibility which can be used for special purposes. If you want to write docu-html-renderer or docu-filter-suite making use of such extensibility more power to you and I would probably contribute.

I write other things too, all of them using Markdown and Pandoc, and in the future hopefully djot and Pandoc. They all have their special purpose needs. For some of them I would like to have more kinds of emphasis, but I must bloody well accept that I have to use spans with attributes for that. I just wish that some attributes which actually are hints to filters to do things for other formats (most often inject LaTeX code) wouldn't end up in my HTML attributes, without me writing lots of bespoke filters to remove them. I think a "hint" syntax which behaves that way is the way to go with special purpose stuff.

vassudanagunta commented 1 year ago

@bpj,

I am opposed to admonitions and similar presentational stuff as first class objects

Not everyone considers admonitions presentational, no more than exclamation points, emphasis, lists (Why not just use "," and "and"?) or even paragraphs (One might argue that breaking up content into chunks is just presentational, as well as introducing pauses in one's speech).

How one styles an admonition (e.g. enclose in a box? background color? How to represent for accessibility) is presentation.

I'm not in the least attached to blockquotes. Their special treatment is an historical holdover

Block quotes represent semantic structure, not just a presentational choice versus inline quotes. Block quotes can have headings, lists and tables. And very importantly, the headings don't participate in the containing content's section hierarchy ("sectioning roots" in HTML parlance).

Likewise any heading structure within an admonition, or any other kind of aside that might be supported, would also not participate in the document outline.

There is a difference between presentational structure and semantic structure. That some people use (or "abuse" if you want to be judgmental) semantic fatures for presentation is beside the point.

bpj commented 1 year ago

The fact remains that you cannot support everyone's special purpose elements, whether you call it semantic or presentational, in core. If you don't think that admonitions are a special purpose: what use does an academic writing their thesis have for them? You don't want headings in your "hinted" block to count towards the structure of the document, fine; one feature of "hinted" blocks should probably be that you can tell the parser to treat them as "isolated" mini documents. That's likely to be useful for many purposes.

jgm commented 1 year ago

I think this is a matter of striking the right balance, which is hard. True, admonitions don't occur in academic writing. But then, footnotes don't often occur in non-academic writing, and math doesn't occur in non-scientific writing, but we have these elements.

mcookly commented 1 year ago

The unicode idea is interesting. I wonder how easy it would be to type unicode on a generic mobile keyboard, however. Letting the user configure the admonition language is a great option too, but it seems to make Djot less renderer- and platform-fluid than Markdown (which is a negative, I assume?).

Using ANSI/ASCII indicators still makes the most sense to me. Perhaps something like this could be used:

::: (!)
This is a warning
:::

::: (+)
This is an affirmation
:::

::: (?)
This is an info message
:::

::: (-)
This is a negation
:::
toastal commented 1 year ago

If anything, I'd say mobile has an easier time doing emoji as, in my experience, mobile keyboards almost always come with an emoji input--including on Ubuntu Touch. On my Linux laptop here, I'll use an add-on on LibreWolf or Unicode input on Kitty terminal--neither of which feel great to use. That said, my stance would be, if accepted, both are supported.

mcookly commented 1 year ago

If anything, I'd say mobile has an easier time doing emoji as, in my experience, mobile keyboards almost always come with an emoji input

I completely forgot about emojis on mobile 🤦🏻‍♂️

Omikhleia commented 1 year ago

Upon an initial thought, I didn't see why Djot would need a specific markup for admonitions. I would have been tempted to use a definition list, with some class attribute to control the rendering.

{ .warning }
:  Here be dragons

   Be very careful here

Why definition lists? As far as I know, it's currently the only block construct in Djot where we can have items with two marked-up elements, which besides the usual interpretation (term and definition), can be used here as title and content.[^1]

It can already be used in current versions, and considering HTML conversion, it's fairly easy to style as felt appropriate.

But if preserving semantics would be preferred, then perhaps another syntax would be needed indeed --- but it would have, imho, to keep the same general simplicity.

Personally, I am a bit more reluctant with respect to using emojis, specific ASCII indicators (such as (!) etc.) or specific Unicode character to denote the "level" of such things, because the possible categorizations are quite limitless (warning, error, info, note, caution, danger, example, ...) and there would always be someone needing a new unforeseen category for which no non-ambiguous character exists.

[^1]: Quote source/attribution is also another case where at least two elements are needed, see #198.

toastal commented 1 year ago

I would disagree with overloading with definition lists in the same way that I object to Microsoft GitHub-flavored Markdowns overloading of the semantics of the blockquote element which muddies their meanings and adds weirdness to the parsing (if definition list, but the special syntax one). I find ASCII and/or Unicode very clear representation of the concept both in rendered and plaintext. I also don’t the idea of magic classnames like .warning as opposed to something more obvious and less verbose in its plaintext representation—and ideally independent from the English language as many pointed out.

possible categorizations are quite limitless (warning, error, info, note, caution, danger, example, ...

This is true, but there are a safe number of admonition types that can be taken from established lightweight markup language such as AsciiDoc to see what types are required. A quick callout to @mojavelinux might answer the question if there have really been requests for more admonition types beyond their current set is (with that set consisting of NOTE, TIP, IMPORTANT, CAUTION, & WARNING). AsciiDoc’s keywords are English, but can be translated via a CLI language switch.

If the goal of Djot is to be a Markdown replacement (yes please), then being able to be embedded inside of other systems (such as support in programming language comments, or a social media posts, etc.) is pretty important and forcing specific CSS classes will cause bugs/incompatibilities over first-class support (though naming conventions can mitigate this to a degree such as CSS Bliss’s convention of Admonition Admonition--danger).

Omikhleia commented 1 year ago

(...) and ideally independent from the English language as many pointed out.

Out of curiosity, about pseudo-class names in attributes being in English: For print-oriented output, how would you mark a header to be unnumbered in Djot? Unless I missed it, it doesn't seem to me that Djot (current specficiation and Lua implementation) supports { - } as in (some extended-)Markdown.... So { .unnumbered } might still be a way... Likewise, how would you mark a specific section header that should not be included in a ToC? .notoc or toc=false? However I look at it, I don't see how to avoid language-specific markup... How would you specify a language on a text span, for correct hyphenation patterns and other language-specific rules to apply? [Je suis français!]{ lang="fr" }? Again, I don't see how one can fully avoid a fixed-named attribute keyword, likely to be in some language... and differing strategies for rendering these in output engines, be it an xml:lang in HTML, or something else adequate, say, in LaTeX or SILE, etc.). --- So it's a much broader discussion indeed, but I don't see why "admonitions" would be that different.

Omikhleia commented 1 year ago

By the way:

(...) markup language such as AsciiDoc to see what types are required.

Sorry to ask, by the way, but who says AsciiDoc has it right about "required" types for generic markup and would be authoritative about what's needed?

If I get you right (I have little experience with it), AsciiDoc has 5 admonitions types: note, tip, important, caution, warning.

Heh, MkDocs has 12 of them: note, abstract, info, tip, success, question, warning, failure, danger, bug, example, quote.

These don't even really fully overlap :)

What would be a "safe number of admonition types from an established lightweight markup language", then? It sounds like an argument from authority, and these seldom come right... ^^

As for semantic mark-up, admittedly non-lightweight, DocBook has an epigraph different from a (block)quote...

EDIT: reStructuredText has 9+1 types: attention, caution, danger, error, hint, important, note, tip, warning... and a generic admonition type. No clean overlap again.

toastal commented 1 year ago

attributes being in English

This goes back to @bpj ’s point of does the syntax need this feature? The more prevalent the feature, the more you want it to get a memorable, easy, first-class syntax whereas obscure features likely deserves to either be omitted or have a less ergonomic syntax which will likely involve additional character that can’t/shouldn’t be described in ASCII or Unicode. My personal anecdote is from the perspective of blogging & technical writing/documentation where I’m hard-pressed to find content where I’m not wanting this feature—such as languages where code comments are parsed and rendered into documentation where you would want richer docs for authors and library consumers. Someone using Djot syntax in the vein of a LaTeX substitute might find they almost never need it, which is what @jgm was pointing out. I’m hoping to persuade that it should be first-class because my biased perspective sees admonitions as necessary/ubiquitous/inevitable for users of all languages in a variety of contexts, but a line could be drawn for them to be second-class or omitted.

who says AsciiDoc has it right [..]?

I offered it as one suggestion since it’s the syntax I have the most experience with. Those other options are good too & it could be useful to ask the authors of these syntaxes for their wisdom on the matter if it were deemed a valuable feature to add. Seeing what others are doing is useful though, so thank you for listing them here.

MkDocs

In my opinion the MkDocs example seems more tied to the icon than an ‘actual’ admonition type (explaining the last section about custom types). I’m unsure why quote is here over > or why abstract would need to be ‘in a box’ like other admonitions. Success/failure seems like weird states too for writing and with caution/danger already being arbitrary enough to need an explanation, I am unsure how bug is different from either of those.

Omikhleia commented 1 year ago

I’m unsure why quote is here over or why abstract would need to be ‘in a box’ like other admonitions.

I can't speak for MkDocs, but regardless of the rendering (in a box or not), I would wonder whether it's due to the fact that the syntax allows for two elements (title and content, in the case of admonitions; content and attribution, in the case of quotes; I can't say what the intent would be for abstracts never having used them).

Hence my linking to #198 above, there's a kind of a common pattern -- Currently, there's no "official" way for such things with two elements, and the only closest equivalent syntax we have is the definition list, for the best or the worst of it...

Someone using Djot syntax in the vein of a LaTeX substitute might find they almost never need it.

Indeed, it's always a difficult decision to assess, but how to know for sure? Personnally, I'd have needed admonitions (e.g. "hints" and "cautions" mostly) in a manual I wrote partly in Djot. I also plan on using (Markdown or) Djot in novel-type books, where epigraphs and attributed quotes do occur (on the other hand, these don't care much about maths or fenced code blocks, lol)... So I do tend to think all of these are worth being first-class syntax structures...

toastal commented 1 year ago

epigraph

Along this train, an abstract / backstory are common initial units to posts or chapters (both of which appear in the BlogPosting schema).

mojavelinux commented 1 year ago

but who says AsciiDoc has it right about "required" types for generic markup and would be authoritative about what's needed

AsciiDoc did not choose these names. They come from DocBook. See https://tdg.docbook.org/tdg/5.1/important.html They represent a certain severity/weight of admonition.

I've never seen a good argument for adding more. More options just confuse people, and, in my experience supporting users for over a decade, people are already confused enough about the difference between the five that DocBook/AsciiDoc already have.

AsciiDoc has the concept of roles, which allow you to add variation to any one of the types (so you can, in theory, have an infinite number that way). This has always been considered enough. It also allows AsciiDoc to cleanly map to DocBook, which has been a core part of the language's ethos (though not a hard and fast rule).

I'm only offering this information for context. I don't have any stake in the decision on this issue.