jgm / djot

A light markup language
https://djot.net
MIT License
1.67k stars 43 forks source link

Multiple representations of thematic break contradicts goal 11 #59

Open oxalica opened 1 year ago

oxalica commented 1 year ago

The syntax spec says,

A line containing three or more * or - characters, and nothing else (except spaces or tabs) is treated is a thematic break (\


in HTML).

Then they went to sleep.

      * * * *

When they woke up, ...

We already enforced canonical representation for headings or code blocks. I don't see any rationale to keep both *** and --- for <hr>. Allowing arbitrary length >= 3 and spacing inside seem also unnecessarily complicate the syntax, since they require infinite-lookahead in the parser (though still linear), in case of ----------- and -----------text. Since this mark can only appears as an individual block, there's no real need to extend the length to "align with some other texts".

Personally I always use --- (exactly 3 characters) in markdown because it's more like the rendered horizontal line (hr).

ad-si commented 7 months ago

I just read the specification and this immediately caught my eye. I'm also voting for --- as the one and only option.

jgm commented 7 months ago

The infinite lookahead point is a good one, so I think we need at least some change.

Of course, we could avoid infinite lookahead while retaining more possibilities for thematic breaks by putting an upper bound on the length of the thematic break.

One advantage of *** over ---, by the way, is that *** doesn't have any other assigned use, while --- could (if it weren't used for thematic breaks) be used to write a paragraph containing just a single em dash.

oxalica commented 7 months ago

One advantage of *** over ---, by the way, is that *** doesn't have any other assigned use, while --- could (if it weren't used for thematic breaks) be used to write a paragraph containing just a single em dash.

That's fair. I don't have a strong opinion between *** and ---. I just prefer it to be canonical, unique, and better without spaces in between.

I also want to add that the example code (with spaces) looks too similar to crontab syntax, where * * * * * actually has a meaning: every minutes.

Then they went to sleep.

      * * * *

When they woke up, ...

So at the first glance, this may look like they sleep one/every minute, and especially set a cron job alarm as a joke. :thinking:

jgm commented 7 months ago

An advantage of * * * is that it's got quite an old history for this use, going back to typewriter days if I can remember.

Omikhleia commented 7 months ago

An advantage of * * * is that it's got quite an old history for this use, going back to typewriter days if I can remember.

As far as old typescripts go, well that could be true for - - - or ------- etc. :D Even in conjunction with various *-based patterns when the typist needed several types of breaks (in novels, etc.), from asterisms to dinkuses and other pendants... So that might not be a very good reason ;)

bpj commented 7 months ago

Well various asterisk based things were used in printed books long before typewriters, including three asterisks with space between them, albeit usually a wider space.

uvtc commented 7 months ago

Aside, it would be nice if djot users could easily get a centered * * * when wanted. I know in the past (with pandoc) I've done something like <center>\* \* \*</center>, because it gave a nice small thematic break without the full horizontal line.

If djot had a syntax for centered text, I'd use that when I wanted the centered * * *, and then just use its current thematic break syntax for the full HR.

Regardless, to reduce any potential ambiguity, I think it's a good idea to require the interstitial spaces in * * * or - - - for the HR syntax.

Omikhleia commented 7 months ago

@uvtc Perhaps I read it wrong, but what does centering and rendering have to do with Djot (syntax)? The renderer (be it HTML or other) is responsible for interpreting and styling it. No?

bpj commented 7 months ago

@uvtc Since in djot even thematic breaks can take attributes you could easily use something like ***{.asters} and then have the renderer, a filter or a post-processor replace it with the desired (raw) HTML. You could probably also do it with some wicked CSS.

Omikhleia commented 7 months ago

@bpj actually the thematic break is a block element, so attributes have to come before (+ blank lines around):

{.asters}
***

Whereas ***{.asters} is the (styled) inline element composed of 3 stars, not a thematic break.

(Same with --- with additionaly it being interpreted as an emdash in inline context)

uvtc commented 7 months ago

Thanks, all. Yes, as also pointed out to me in https://github.com/jgm/djot/discussions/205 , I see that I could do


{.center}
\* \* \*

and have some css to center that.