jgm / djot

A light markup language
https://djot.net
MIT License
1.63k stars 43 forks source link

Add an optional significant-newlines mode without hard-wrapping accommodation? #161

Open andersk opened 1 year ago

andersk commented 1 year ago

I really appreciate the thoughtful, principled design of djot! But when I showed it to the rest of the Zulip team as a potential foundation for replacing our (slow, buggy, not-quite-standard) Markdown implementation, the one main point of pushback was on the inability to interrupt a paragraph with a list. “Requiring people to put a blank line between text and a list works out very poorly—Zulip’s markup originally worked that way, because Markdown did, and it was a source of constant complaints.”

This is a tricky dilemma. I do understand djot’s decision, and I think it’s probably correct for most use cases. But it seems to be a sticking point for many potential users. Some previous discussions that hopefully don’t need to be rehashed here:

I have one observation to maybe add. In practice, Markdown is typically used in two modes today:

  1. A collapsed-newlines mode where text may be freely hard-wrapped in the source without affecting the output. Most useful for longer documents written in a real editor, checked into version control, and maintained over time, where source code readability is important. Example: GitHub’s rendering of Markdown files in the file browser.

  2. A significant-newlines mode where line breaks in the input are passed through to the output (via <br> conversion or white-space: pre-wrap styling). Most useful for shorter comments and chat messages written in a dumb textbox, perhaps by users who don’t want to think as much about formatting, typically just written and rendered once. Example: GitHub’s rendering of Markdown comments in issues.

The users who complain about blank lines before lists are more likely to be writing in the latter mode, which is also the mode where hard-wrapping doesn’t make sense. Therefore, perhaps a solution to the dilemma would be to standardize slightly different parsing rules for these two modes, and restrict the hard-wrapping accommodations to the former mode? A blank line before a list would be required in collapsed-newlines mode, and optional in significant-newlines mode.

I know it’s not ideal to have the parsing rules diverge like this. But maybe it’s a reasonable compromise given that both modes are likely to exist anyway? It does make some intuitive sense that a 1. following a newline should be treated as more “intentional” when the newline itself would be significant in the output.

gavinhoward commented 1 year ago

I'm not the creator, just someone who has gone all-in on djot for my software documentation. This is my own opinion, not anyone else's.

I think you have a point. However, I'm not sure the world would be best served by having two modes of djot when Markdown serves well enough in the significant-newlines mode.

I personally think that djot is great for documentation checked into version control and that it should stay designed that way. On the other hand, I also think Markdown with significant newlines would be better for comment systems since Markdown is close to what people would write in plaintext anyway.

If people really want djot for comment systems, I think it would be better to make a new language specifically for that, with different design goals (such as removing goal 7). I'll refer to it as "tojd" (djot backwards) for now.

I think a tojd language could be useful for the simple fact that it could share most of a parser with djot, but it would also give those who hate goal 7 a way out. It would also be well-optimized for comment systems.

And I think it should be a separate language so that each language could serve best the use case it is useful for: djot for documentation and tojd for comment systems, especially since those use cases have incompatible design constraints. It would also be clearer to users of both when they are using one or the other.

That said, I can't say if @jgm would even be interested in helping define such a language, though I might be.

timabbott commented 1 year ago

I think you have a point. However, I'm not sure the world would be best served by having two modes of djot when Markdown serves well enough in the significant-newlines mode.

I feel like that'd be a bad solution. Sure, you can always use Markdown and Commonmark instead, but djot is a better design in many ways that have been clearly articulated by @jgm, and it'd be great if most of the ecosystem of projects that currently use Markdown or CommonMark can migrate to djot in order to benefit from those improvements. Today, projects using original Markdown rather than CommonMark are essentially all those for whom legacy considerations make migration hard; I'd like it to be that way for djot as well, where projects should only be using CommonMark if legacy considerations / limited resources prevent them from migrating to djot.

As best I can tell, this detail about newlines is the main language design reason I've seen that would motivate someone to start a new project using CommonMark rather than djot, and I think @andersk has articulated clearly the crux of the issue -- that there's actually two different use cases for a Markdown-style processor here with slightly different requirements. I would much prefer for significant newlines functionality to be part of djot (or at least a specified extension/variant) rather than a "separate language", to avoid fracturing the ecosystem.

On the naming point -- obviously this is entirely up to @jgm, but I think "djot with significant newlines" name would work well; I could easily see a project saying "We use djot with significant newlines" in the same way that projects today might say "We use GitHub-flavored Markdown" or "We use CommonMark with a few extensions" to provide a clear, concise description of what markup language they're using.

gavinhoward commented 1 year ago

I would much prefer for significant newlines functionality to be part of djot (or at least a specified extension/variant) rather than a "separate language", to avoid fracturing the ecosystem.

I can see your point here. Ultimately, it comes down to a judgment call, and only @jgm would be able to tell us what he wants.

jgm commented 1 year ago

It's an interesting suggestion, and I remember we talked about something similar in the early discussions leading to commonmark.

I think losing a single uniform syntax is a heavy cost, so I'd be reluctant to go this way, but I will think about it some more.

I also wonder how much of the resistance to the current syntax is just based on familiarity. Once you get used to adding a blank line before a list, it's not a big thing. (And the source then more closely resembles what you find in printed books, which always set off lists from the main text.) People complained when GitHub changed heading syntax (to conform to commonmark) so that a space was required after #, but I don't think too many people care about that now, since they've gotten used to it.

The objective in a lot of my work with markdown/commonmark was: "try to do what the writer intended in the vast majority of cases." That's going to lead to a syntax with lots of quirks and rough edges (and one that, inevitably, still won't magically always do what the writer intended). Djot has somewhat different aims. It wants to be principled and uniform. It expects that the user has read the syntax guide.

It would be good to write down exactly what would change in the envisioned special mode. Is it this?

  1. Lists, blockquotes, headings, and other block constructions can interrupt paragraphs.
  2. Newlines within paragraphs create hard breaks.

Is that it?

andersk commented 1 year ago

Yeah, I think that’s it.

(A question occurred to me about whether blocks should still interrupt after backslash-newline in this mode:

paragraph\
- list item?

I guess they still should, for consistency with this existing behavior?

> blockquote\
- list item

But nobody’s likely to write this, so whatever makes the spec cleanest is fine.)

Shados commented 1 year ago

The users who complain about blank lines before lists are more likely to be writing in the latter mode, which is also the mode where hard-wrapping doesn’t make sense.

Personally, most of my use of commonmark is in the former mode, but the requirement for blank lines before lists still annoys me endlessly, even though I do hard-wrap my content. I'd much prefer that neither lists nor sublists require a preceding blank line, and that the "interrupted a paragraph with an unintended list" case be solved by escaping; editors' wrapping tooling could well be adapted to detect and automatically escape for this case.

LemmingAvalanche commented 1 year ago

OP, that people write in different modes is such a good point. Lightweight markup languages should be both readable and easily writeable, and sometimes these two points are in tension.

The problem though with creating two modes is that people then always have to put adjectives or disclaimers in or around the proper name.

In a perfect world with infinite resources (speaking from the peanut gallery here), it would be nice to have two distinct entities:

  1. This language with its current name
  2. Your proposed new “mode”

And for them to have different names.

Either that or something like djot/.

Of course you would have (see previous: infinite resources) utilities to convert between the two languages at any time. So then the only potential confusion would be if you are in a txt document and you forgot which variant you are currently in. ;)

Unfortunately though the hangup about the required blank line before a sublist would remain unless those who don’t like that rule also don’t like to hard-wrap (seems unlikely).

andersk commented 1 year ago

@LemmingAvalanche My point was that people are inevitably going to use Djot in these two modes anyway (even if only by feeding the output through a newline-to-<br> postprocessor or a white-space: pre-line CSS rule). They already do this with Markdown—again, see the Markdown flavor in which we’re having this very discussion.

If we support both modes, we won’t need any additional resources to convert between them; that will automatically be possible by running the existing bidirectional Djot ↔︎ AST conversion, forwards in one mode and backwards in the other.

stoicon commented 1 month ago

The concern of most users is that a blank line is needed before a sublist. So, what about adding an escape mode only for lists, so that it enables to write a sublist without a blank line.

For example:

*```
- top
  - inner
- top
  - inner
- top
  - inner

This is similar to the Math syntax used in djot.

$``` e=mc^2

x^n + y^n = z^n

(Instead of an asterisk, some other character can also be used.)