jgm / djot

A light markup language
https://djot.net
MIT License
1.71k stars 43 forks source link

Relax disallowing multiple words in `code block` / `div` first line? #214

Open chrisjsewell opened 1 year ago

chrisjsewell commented 1 year ago

Currently

```name
content

:::name content :::


goes to

code_block text="content\n" lang="name" div class="name" para str text="content"


but
content

:::name a content :::


goes to

para verbatim text="name a\ncontent\n" para str text=":::name a" soft_break str text="content" soft_break str text=":::"



This feels a little unintuitive to me. Is there a strong reason why this has to be the case?

Could the "additional" first-line content not be stored on the AST nodes?
It would then not be used in the standard HTML renderer,
but could be used by macros 
jgm commented 1 year ago

Let's see what GFM does with it:

``` python more
hi
becomes

``` python more
hi

and there is no trace of more in the rendered HTML. So this behavior is implementing a kind of standard. But you're right that we could, in principle, treat the rest as additional attributes. But how? Split by spaces and make them classes? What if there is punctuation not normally allowed in classes?

chrisjsewell commented 1 year ago

Let's see what GFM does with it

Commonmark stores the entire first line as info: https://spec.commonmark.org/dingus/?text=%60%60%60name%20a%0Acontent%0A%60%60%60%0A%0A

jgm commented 1 year ago

I've revised my note above.

chrisjsewell commented 1 year ago

Split by spaces and make them classes? What if there is punctuation not normally allowed in classes?

Do you need to split it at all? Just have the whole string be the lang

For div, hmmm; firstly I would ask, is there really a need to have this "store the first word as a class" semantics?

You already have block attributes for setting classes, why not just store ithe whole string under a key as well and be done with it 😄

chrisjsewell commented 1 year ago

(thanks as always for the rabid rapid replies!)

jgm commented 1 year ago

rabid :dog2:

Pandoc doesn't store these in the lang attribute; it adds them as classes. (This is the way it has always behaved, and changing it now is probably not a good idea.)

chrisjsewell commented 1 year ago

rabid 🐕

😆 🤦‍♂️

chrisjsewell commented 1 year ago

Pandoc

https://pandoc.org/try/?params=%7B%22text%22%3A%22%60%60%60a+b%5Cncontent%5Cn%60%60%60%5Cn%5Cn%3A%3A%3Aa+b%5Cncontent%5Cn%3A%3A%3A%22%2C%22to%22%3A%22json%22%2C%22from%22%3A%22markdown%22%2C%22standalone%22%3Afalse%2C%22embed-resources%22%3Afalse%2C%22table-of-contents%22%3Afalse%2C%22number-sections%22%3Afalse%2C%22citeproc%22%3Afalse%2C%22html-math-method%22%3A%22plain%22%2C%22wrap%22%3A%22auto%22%2C%22highlight-style%22%3Anull%2C%22files%22%3A%7B%7D%2C%22template%22%3Anull%7D

This is the way it has always behaved, and changing it now is probably not a good idea

Does djot have to do what pandoc does though?

chrisjsewell commented 1 year ago

It seems like Pandoc does not follow commonmark here 🤔 https://spec.commonmark.org/0.30/#info-string

jgm commented 1 year ago

Pandoc's commonmark and gfm and commonmark_x parsers will ignore the additional content in the case of code blocks. This could be modified to store the whole line in an info attribute, or perhaps to do so only if this content would differ from the class already stored.

Pandoc's markdown parser is different. Part of the motivation here is to avoid confusing inline code that happens to start at the beginning of the line and uses three backticks with a code block.

chrisjsewell commented 1 year ago

Part of the motivation here is to avoid confusing inline code that happens to start at the beginning of the line and uses three backticks with a code block.

It feels like, if you have "committed" to writing three backticks at the start of the line, then you are expecting to write a code block. I can't imagine there being any time that you actually want this as inline? Note, commonmark prohibits backticks being in the info string (then it is parsed as inline), so you can still write inline:

```inline something```

just not

```inline something
jgm commented 1 year ago
``` ``Markdown code spans with ` inside them`` ``` can be quoted with ` ``` `.

Let's see how GH renders it:

``Markdown code spans with ` inside them`` can be quoted with ```.