jgm / djot

A light markup language
https://djot.net
MIT License
1.66k stars 43 forks source link

Revisit the concept of tight/loose list. #138

Open crlf0710 opened 1 year ago

crlf0710 commented 1 year ago

This is inherited issue from CommonMark: there's a concept of tight list, where each <li> directly contains inlines (leaf block), vs the "loose" lists, where <li> contains a paragraph that contains inlines (container blocks). In commonmark the loose lists has items separated by blank lines, while the tight lists does not. So this is the properly of the list itself.

But this design doesn't actually serve its purpose well. The end product is determining whether each list item is a leaf block or a container block. When a blockquote or a verbatim block is within a list item, it has to be a container, even within a tight list. At this time the html output will be half container and half leaf, e.g.

1. item one
   > xxx
2. item two

outputs

...
<li>item one
<blockquote>
<p>xxx</p>
</blockquote>
</li>
...

This mix of inline and block is unnecessarily complex and not well supported in many output formats,

Also, this leads to the "last-minute loose" issue, where a blank line between the 99th and 100th list items can turn the whole list from tight to loose. All renderers has to wait until the last list item come up before it can determine the actual output of the very first list item.

I hope we can remove the concept of tight/loose list, and replace it with heuristics determining the container/leaf-ness of each list item with good locality, and simply allow in the same list, some list items are containers and some list items are leafs, but not both at the same time.

matklad commented 1 year ago

Tentative +1. I've tried to make tight/loose distinction work for my blog, but that was quite a bit of hassle for no real gain, so I render all lists the same. Basically, I am a confused user who walked away from the feature.

It does seem to me that delegating that to the writer's (heuristics) + a couple of classes to override

{.loose}
- foo
- bar
- baz

wold be better.

matklad commented 1 year ago

Oook, so after putting two and two together, it was this second that I've realized, Molière-style, that tight/loose list comes from markdown. I've been using markdown for I don't know how long, and it's the first time I realize that this feature exists. In retrospect, this explains why sometimes my lists look off in markdown (they accidentally loose)!

I now feel strongly that we should remove it )

chrisjsewell commented 1 year ago

+1, I feel most Markdown users don't know about / understand this syntax feature, and thus it leads to surprising results

jgm commented 1 year ago

The distinction doesn't have anything to do with leaf/container nodes. Note here that the AST is exactly the same for these two cases, except for the tight attribute on the list.

- one
- two
doc
  list list_style="-" tight="true"
    list_item
      para
        str text="one"
    list_item
      para
        str text="two"
references = {
}
footnotes = {
}
- one

- two
doc
  list list_style="-" tight="false"
    list_item
      para
        str text="one"
    list_item
      para
        str text="two"
references = {
}
footnotes = {
}
jgm commented 1 year ago

The distinction relates only to how the list is formatted in the output format. (And an output format that doesn't support the distinction---some don't---could just ignore it.)

The way it's handled will vary from one output format to another. In HTML, it involves removing <p> tags. In LaTeX, it involves setting some length parameters.

Of course, it might be possible to handle it a different way in HTML -- leaving the <p> tags and adding a CSS class, then adjusting things with CSS. It would be fine for a renderer to do this instead of what the default HTML renderer does (which is similar to what Markdown.pl did).

jgm commented 1 year ago

The main question is whether there should be a mechanism to indicate this distinction in the source document. If the answer is yes, then the source readability desideratum strongly favors the way we currently indicate it (by looking for blank lines) over something like an explicit class .loose. (Avoiding English-centric directives is another design desideratum.)

bpj commented 1 year ago

Of course, it might be possible to handle it a different way in HTML – leaving the <p> tags and adding a CSS class, then adjusting things with CSS. It would be fine for a renderer to do this instead of what the default HTML renderer does (which is similar to what Markdown.pl did).

My gut feeling is that relying on CSS to remove the (normal) effect of tags should not be the default; if one doesn’t want a tag to make a difference one shouldn’t include the tag at all. That is and must be the default assumption.

Inserting whitespace through CSS is another matter though. In CSS margins on nested elements, unlike padding, aren’t additive; rather the widest margin on any of the nested elements “wins”, so it would be “safe” to only wrap list item content in paragraphs when a list item actually contains multiple blocks and then use CSS like this:

p, ul.loose > li, ol.loose > li {
    margin-bottom: 1em;
}

The actual whitespace seen will be only one em even when the last child of a <li> is a <p>.


I very much think that a feature shouldn’t be removed just because people can’t be bothered to read the manual. It can’t be to much to ask that users should read the syntax description, where the difference between tight and loose lists is described clearly enough. That said the loose-list feature is not without its problems, mainly that if any item in a list contains more than one block the whole list becomes loose, i.e. the content of every list item is wrapped in <p> tags. That is frequently more than I for one want, so for that reason it might be better to let users explicitly request a loose list through a class or some less subtle syntax than the presence of blank lines within the list (though I have no idea what that syntax might be) letting renderers pick up an attribute on the list element object and implement looseness however suits the output format.

mcookly commented 1 year ago

It's been a while since this has been discussed, so I'm not entirely sure where things currently stand, but I'll throw a thought out there anyway.

Currently, the djot syntax allows for multiple lists without any whitespace between them. So

- One
- Two
- Three
* A
* B
* C

would become two separate lists in HTML.

Would it not be more consistent with syntax to have a blank line between different lists? This could also let one of the bullet list indicators stand as a way of designating a loose item in a list. For example,

- Tight
+ Loose
- Tight

i. Tight (ordered)
+ Loose (ordered)
i. Tight (ordered)

The type of list should already be known (if I understand the AST correctly) and applied to the loose item. This doesn't work though if an entire ordered list should be loose. In that case, it might be better to have an ordered loose list item designated by i+. or something like that. For example,

i+. Loose (ordered)
i. Tight (ordered)
i+. Loose (ordered)

The lack of alignment in the items isn't nice though, either. I suppose that could be fixed by using a slightly different syntax for ordered lists, like so:

+i. Loose (ordered)
-i. Tight (ordered)
+i. Loose (ordered)

Just an idea I wanted to throw out there.

Edit: Consistency in my examples.

vassudanagunta commented 1 year ago

@jgm I found this issue only after I posted #201 -- the Discussion search doesn't include Issues unfortunately. I can't delete that discussion. Nonetheless should I just copy my proposal there to here and turn that into an empty discussion?

jgm commented 1 year ago

I think you could just copy your proposal here and put a link to this issue on the discussion topic.

vassudanagunta commented 1 year ago

proposed principles

In my own plain text syntax work I'm at least 90% settled on the following principles:

loose/compact as rendering specific decisions

It's easy for a renderer to determine whether list items in a particular presentation should or must be separated from each other for visual clarity. For example, a list item that contains multiple paragraphs should be separated by extra white space from adjacent list items to avoid misleading visual groupings -- but if the list items were rendered with alternating backgrounds then it might not be necessary. If the deciding factor is purely aesthetics, that too should happen in rendering specific decisions, e.g. CSS.

It's also important to recognize that plain text and graphical renderings have different presentational needs with regard to blank lines or vertical white space. Even different plain text syntaxes, such as Setext and ATX headings, have different needs. In Markdown,

- item one
- item two
  ## a heading
  another block of
  text
- item three

results in a tight list, while

- item one
- item two

  a heading
  ---------
  another block of
  text
- item three

results in a loose one, but they both should look the same in HTML. Whatever the plain text syntax white space rules, they shouldn't dictate anything about renderings in other forms.

CommonMark had its hands tied because of Markdown precedent. There is no reason for djot to be encumbered by this.

how this would work for djot, expressed in djot

This list happens to be compact in djot syntax:

* It can be rendered in in HTML compactly (without
  `p` tags) because:
  * each item is a single chunk of text
  * like in a table cell. There is no nesting
    of blocks
* But a rendering can choose differently, either
  in its output structure (e.g. `p` tags) or via
  stylesheet.
* If a rendering decision *must* be made in the
  plain text, it should be made with the standard
  djot mechanism for this: *block attributes*.
  See below.

This next list has blank lines only because djot
syntax requires it, not because a loose list was
desired:

* Any blank lines required by a plain text syntax
  (as opposed to the text's instrinsic structure, 
  e.g. paragraphs) should have zero impact on any
  renderings into other formats.

  * I have a preceding blank line *not* because I
    want to be loose, but for consistency with
    djot's "Paragraphs can never be interrupted by
    other block-level elements" rule.
  * I have an internal blank line *not* because I
    want to force a loose list, but because in djot

    > Paragraphs can never be interrupted by
    > other block-level elements
* A rendering, though, can make its own choices,
  be it through the rendered structure or application
  of stylesheet.

{.loose-list}
* If the rendering choice needs to be made in the
  plain text, use a block attribute, which can
  override the renderer's choice, whatever it is.
  * A smart HTML renderer would render this list
    compactly by default.

  {.highlight}
  * 🌶 Critically, the insertion of a block attribute
    and its required preceding blank line has no
    impact in and off itself.
  * That would defeat the purpose!
  * Neither the outer or inner lists are impacted by
    the block attribute's presence, only its content.

This proposal not only separates content, syntax and presentation concerns, it disentangles djot syntax from the loose/compact list issue. Overloading blank lines that way (djot block element delimiter and loose list indicator) is simply a recipe for conundrums, paradoxes and befuddlement.

vassudanagunta commented 1 year ago

The last point I make would resolve Issue https://github.com/jgm/djot/issues/200. It will disentangle https://github.com/jgm/djot/discussions/183's resolution from list looseness/compactness.

I definitely agree with @matklad:

It does seem to me that delegating that to the writer's (heuristics) + a couple of classes to override

{.loose}
- foo
- bar
- baz

wold be better.

jgm commented 1 year ago

I think I like the idea that this is for the renderer to decide. But having a way to override it also seems potentially important, and there we get into the issue of English words again. Maybe it's okay if it's just a generic instructor for the renderer, and something that wouldn't need to be used much?

vassudanagunta commented 1 year ago

If the override is via a block attribute and a classname, doesn't that avoid introducing English into djot? Or is the concern that djot's built-in renderers will have hardcoded English classname dependencies?

jgm commented 1 year ago

Or is the concern that djot's built-in renderers will have hardcoded English classname dependencies?

Yes.

mcookly commented 1 year ago

Leaving the renderer to decide an arbitrary attribute label for tightness/looseness seems like a good idea. Another option may be to indicate a tight list by using a hard linebreak:

- Hi
- I'm a
- loose list

- Hi,\
- Loose list,

  I'm not a\
- loose list

But all in all, I think the simplicity of using an attribute is ideal. I really haven't seen many Markdown users utilize the distinction. If it's really needed, I believe granular control of spacing is possible through in-list attributes or using multiple lists.

waldyrious commented 10 months ago

@jgm note that the way the last line of commit fc4dff77d61566882da9742edb9f3b7ccdee3fbe was phrased resulted in #249 being automatically closed, but not this one nor #200.

jgm commented 10 months ago

Well, that's okay, because I've had second thoughts now and reverted the change!

jgm commented 10 months ago

In general, I don't like the idea of using English-language attributes to control parsing. The spirit of light markup is to rely on features of the source document that reflect the desired distinction. Blank lines between list items is a great example of such a feature.

If we wanted to completely remove the tight/loose distinction, or make this completely up to the renderer, with no opportunity for the author to adjust it on a list-by-list basis, that would be one thing. But making a distinction in the renderer that can be overridden by an attribute looks like a step backwards to me. Why not just stay with the current method of looking at blank lines between items?

On the other other hand, I'm not sure I'm on board yet with the idea of just letting the renderer decide in a non-overridable way. The simplest algorithm would be: if the list consists entirely of items that are single paragraphs, or one paragraph followed by another list, then it's tight/compact. But sometimes when I have particularly long paragraphs in my list items, I don't want them to be rendered in a compact way. This would be the kind of case in which I'd want to be able to override the default. Of course, one could try to make the renderer's decision more subtle (measuring paragraph length etc.), but that's bound to be a stylistic decision that won't be to everyone's liking.

bpj commented 10 months ago

For me, in my experience with Pandoc Markdown, the annoying thing is that the content of all list items gets wrapped in paragraphs/becomes "loose" if only a single item contains multiple blocks. I have never really understood why. Space between list items is a separate issue entirely if an issue at all; it is IMO entirely reasonable if that causes a list to be “loose”.

mikeando commented 6 months ago

I agree that in markdown the accidentally switching between tight and loose lists is painful, when all you're trying to do is to make the raw document more readable.

To me the question of whether djot could use {.loose} instead comes down to whether a list being tight or loose should be a style type decision, or part of the djot document model.

IMO it is a style decision. I can imagine cases where renderers might not want to differentiate between those cases, or might even want a finer differentiation between list renderings. (I guess the same is true for many parts of the language - so maybe that argument is moot).

If djot were to lose the tight/loose distinction, then {.loose} would just a default style that djot recommends renderers apply - with further suggestions on how to apply it for HTML output. I suspect that it would slightly simplify parsing of lists too, as you no longer need to keep track of whether some child element is causing the list to be loose - maybe particularly relevant for event based parsing

Treating it as a style then gives a sensible point of internationalisation or customisation - through renderer settings, rather than needing to address it directly. If you want to annotate loose lists as {.XYZ} thats just a renderer configuration.

In that way it becomes more like {.note} on a paragraph that a renderer might handle differently, and in an international situation the exact tag used would want to be configurable in some way.

rauschma commented 4 months ago

I’m also in favor of not making the distinction between loose lists and tight lists in djot:

If loose/tight lists are to be supported, this algorithm makes most sense to me:

The simplest algorithm would be: if the list consists entirely of items that are single paragraphs, or one paragraph followed by another list, then it's tight/compact.

jgm commented 4 months ago

Djot supports attributes for lists – so there is an easy way to support tightness.

Although this is true, it is nicer to allow distinctions to be made without English-language labels. That is what makes me inclined to keep the distinction.

I’d rather change how a tight list is displayed via CSS than by omitting paragraph tags.

Is there a reliable way to do this that gives the same appearance as omitting paragraph tags across browsers?

The simplest algorithm would be: if the list consists entirely of items that are single paragraphs, or one paragraph followed by another list, then it's tight/compact.

I don't think this is good, because when the paragraphs get to a certain length, one generally wants some space between items.

rauschma commented 4 months ago

Is there a reliable way to do this that gives the same appearance as omitting paragraph tags across browsers?

What I find tricky is that omitting paragraph elements produces mixed results – e.g., a tight list inside a loose list is rendered like this:

• First

  • First A
  • First B

• Second

• Third

And not like this:

• First
  • First A
  • First B

• Second

• Third

Maybe we want tight lists plus an optional class for specifying vertical space between list items?

This is what can be done with CSS (caveat: my knowledge of CSS is limited)

bpj commented 4 months ago

Although this is true, it is nicer to allow distinctions to be made without English-language labels.

If it is up to the renderer the labels can be in any language: the renderer can be written or configured to recognise labels in the author's language.

It's a weak argument also because once you have attributes they have to be text in some language. It is a nice idea to avoid textual markup, but it is hard to express things with punctuation and whitespace once you go beyond the basics. It is great for the basics though! :-)

I wonder if it wouldn't be possible, and easier to understand, if you mark individual list items as loose by starting the text on the line below the list marker? Not very pretty but easy to see.

jgm commented 4 months ago

Sure, it could be in any language. But suppose you have a document with hundreds of lists, some tight, some loose. Do you really want to be adding these attributes (which look like "markup" not text), when you could just be writing something in the source that already looks like the result you want?

jgm commented 4 months ago

I wonder if it wouldn't be possible, and easier to understand, if you mark individual list items as loose by starting the text on the line below the list marker? Not very pretty but easy to see.

Very ugly I'd say!

faelys commented 4 months ago

I wonder if it wouldn't be possible, and easier to understand, if you mark individual list items as loose by starting the text on the line below the list marker? Not very pretty but easy to see.

Very ugly I'd say!

Is it really that ugly when you enforce indentation at the beginning of loose list items and make optional all line-skipping between items?

+
  First Loose Item

  - First Tight Item
  - Second Tight Item
+
  Second Loose Item
+
  Third Loose Item
rauschma commented 4 months ago

How about these rules for tight lists?

See last example here: https://codepen.io/rauschma/pen/mdgXEKb

rauschma commented 4 months ago

Another issue with tight lists depending on the absence of empty lines is that nested lists must contain an empty line before the inner list. That is, in the following example, the outer list is tight but it doesn’t look tight:

1. Apple

   * Juice
   * Sauce
2. Orange
3. Banana

The empty line after “Apple” is mandatory.

Solution 1: one-paragraph tightness

With these rules, I’d probably write the previous example like this (which, IMO, look neater because it’s more regular):

1. Apple

   * Juice
   * Sauce

2. Orange

4. Banana

Solution 2: list item bullet that can interrupt paragraphs

The current bullets (1., 1), *, -, etc.) must be separated from paragraphs with empty lines because they might continue a paragraph line when doing hard wrapping:

This book is in my top
10. Only if there are very few (2 or
3) people. Multiplication: `3
* 2`.

It’s conceivable to come up with bullets that don’t have this issue. Sadly, they don’t look that good and are less obvious than 1. and *.

<1> Apple
    <*> Juice
    <*> Sauce
<2> Orange
<3> Banana

:1 Apple
   :* Juice
   :* Sauce
:2 Orange
:3 Banana