jgm / djot

A light markup language
https://djot.net
MIT License
1.67k stars 43 forks source link

New vs continuing paragraph after block quote or other set-off content #50

Open jgm opened 2 years ago

jgm commented 2 years ago

Here are two kinds of texts we might want to distinguish:

paragraph content

> block quote

continuation of paragraph

vs

paragraph content

> block quote

new paragraph

A deficiency of Markdown is that there is no way to distinguish these cases. The problem is reduced if one renders in a format that does not indent new paragraphs, because then there is no visual distinction between the cases. But they are semantically different and can be distinguished, e.g., in print output with indented paragraphs. There should be a way to distinguish them in the source.

The problem is not raised only by block quotes but occurs also with set-off equations, images, tables, code, and lists.

I recently found myself creating a pandoc Lua filter that implements the following syntax for the "continued paragraph case":

paragraph content

> block quote

_ continuation of paragraph

(The filter just inserts a LaTeX \noindent command where the _ is.) This is not too bad actually. It would be nice if djot had some way of making the distinction.

bpj commented 2 years ago

Why not indentation for an embedded blockquote? That seems the most intuitive to me.

jgm commented 2 years ago

The question is not about the syntax of the block quote, but about how to mark what follows it as either a new paragraph or a continuation of the previous one.

bpj commented 2 years ago

I mean that if what follows the blockquote is a continuation the blockquote is indented == the blockquote is embedded in a paragraph.

paragraph

    > blockquote inside paragraph

rest of paragraph (continuation)

vs.

first paragraph

> blockquote after paragraph

another paragraph

I hope that is clearer.

jgm commented 2 years ago

Yes, got it now.

vassudanagunta commented 1 year ago

Does jdot's AST support block elements nested within a paragraph?

jgm commented 1 year ago

We wouldn't need the AST to support block elements as children of a paragraph. It would be sufficient just to be to mark the following content as "not a new paragraph."

uvtc commented 1 year ago

Thinking about a syntax for, "anyhow, as I was saying", I was going to suggest ..., as in:

The boat ride took us through the everglades.

> It was one of those "airboats" with the giant propeller.

... We saw a lot of birds but no alligators.

But that causes a pretty big indent, and ... already automatically gets you a "…" in djot, and it might cause problems when the author wants an actual ellipses.

The leading underscore is ok, but also does make me think italics.

Since "and" is at least somewhat close to "anyhow, as I was saying", maybe &?

The boat ride took us through the everglades.

> It was one of those "airboats" with the giant propeller.

& We saw a lot of birds but no alligators.

I like that one because,

vassudanagunta commented 1 year ago

@jgm,

We wouldn't need the AST to support block elements as children of a paragraph. It would be sufficient just to be to mark the following content as "not a new paragraph."

I understand. Would you mind answering a related long standing question I've had about terminology?

Is there a distinction between an abstract syntax tree and an intermediate representation? Since djot parses to an AST, and since you are proposing a new djot syntax for paragraph continuation, your suggested approach above makes sense. But if instead you needed to model a general abstraction of structured text, independent of any specific syntax, such as the "AST" at Pandoc's core, then it might be better to represent it as a single paragraph with a nested block quote, yes? And whether or not that is the better representation of this specific case, would you agree that there is nonetheless a difference between an AST and an IR, and that the core data structure of Pandoc is better characterized as an IR?

jgm commented 1 year ago

It might make more sense conceptually to allow a block quote to be a child of a paragraph. But this would make the interface with Pandoc's types more complicated. I don't know what is best.

About terminology, I'd say that "IR" is the genus and "AST" is one species.

vassudanagunta commented 1 year ago

About terminology, I'd say that "IR" is the genus and "AST" is one species.

ok, thank you.

RE the bigger question, some things to consider:

  1. I would say that Pandoc's solves the problem of translation between so many different input and output forms by defining an IR that is syntax independent and more or less a semantic superset of those syntaxes. Then the question becomes whether representing paragraphs that span block quotes (or other elements, see below) is a universal or common enough to warrant complicating Pandoc's IR.

  2. An old W3C www-html list discussion: Re: Lists within Paragraphs. An excerpt:

    I think this is part of a bigger problem.  Paragraph's can't contain block
    level elements.  At first this seems to make a lot of sense.  But it
    doesn't work in many instances.
    
    For example often block level mathematical formulas occur in paragraphs.
    If we consider
    
                      x + y = z
    
    as such an example, we see that in this case this paragraph is the still
    the same one, but we have a block level element in it.
  3. The HTML spec's ultimate answer admits that paragraphs might logically span block elements, but that it doesn't apply to the HTML standard:

    List elements (in particular, ol and ul elements) cannot be children of p elements. When a sentence contains a bulleted list, therefore, one might wonder how it should be marked up.

    For instance, this fantastic sentence has bullets relating to

    • wizards,
    • faster-than-light travel, and
    • telepathy,

    and is further discussed below.

    The solution is to realize that a paragraph, in HTML terms, is not a logical concept, but a structural one. In the fantastic example above, there are actually five paragraphs as defined by this specification: one before the list, one for each bullet, and one after the list.

    The markup for the above example could therefore be:

    <p>For instance, this fantastic sentence has bullets relating to</p>
    <ul>
    <li>wizards,
    <li>faster-than-light travel, and
    <li>telepathy,
    </ul>
    <p>and is further discussed below.</p>

    Authors wishing to conveniently style such "logical" paragraphs consisting of multiple "structural" paragraphs can use the div element instead of the p element.

    Thus for instance the above example could become the following:

    <div>For instance, this fantastic sentence has bullets relating to
    <ul>
    <li>wizards,
    <li>faster-than-light travel, and
    <li>telepathy,
    </ul>
    and is further discussed below.</div>

    This example still has five structural paragraphs, but now the author can style just the div instead of having to consider each part of the example separately.

  4. Allowing paragraphs to span/nest block elements provides, I think, a cleaner and more consistent solution to "tight lists". For example, the following would be a tight list because each list item contains exactly a single element (a paragraph):

    - para 1
    - para 2
     - a
     - b
     - c
    - para 3

    The current CommonMark solution has flaws, as can be seen by comparing

    - item one
    - item two
     # a heading
     more text
    - item three

    with

    - item one
    - item two
    
     a heading
     ---------
     more text
    - item three

    Both should be treated as loose lists since the second item in each contains block sequences, but CommonMark's determination is based on the existence or lack thereof of blank lines in the source, not logical structure.

I hope this is helpful. Please let me know if you've had enough! It just happens to be a question I've been trying to tackle myself.

dsanson commented 1 year ago

Just commenting to second @bpj's proposed use of indentation for this.

uvtc commented 1 year ago

Re. @bpj 's suggestion about indenting: would this cause a problem with putting lists between paragraphs? That is, with a list you may (and typically) indent the list marker. Is there a difference between a list that's its own paragraph vs a list that's in the midst of a paragraph?

david-christiansen commented 11 months ago

Including lists in paragraphs is an important use case for the kind of writing that I do, at least, and neither the suggestion of a leading _ nor the suggestion of indentation work well for this case.

I would need to distinguish between all of the following:

A:

Some text:
\begin{itemize}
\item A
\item B
\end{itemize}
And more text

B:

Some text:

\begin{itemize}
\item A
\item B
\end{itemize}
And more text

C

Some text:
\begin{itemize}
\item A
\item B
\end{itemize}

And more text

D

Some text:

\begin{itemize}
\item A
\item B
\end{itemize}

And more text

Leading underscore works to distinguish A from C. But not to distinguish A from B, nor C from D. It catches part of the A/D distinction.

Indentation doesn't work for any of them.

An alternative design is a convention that there's a div for "multi-paragraphs" that contain multiple block elements. It's ugly but accurate:

::: {.paragraph}
Some text:

* A
* B

And more text
:::

would denote option A.

This would be tool specific, however, but that's perhaps OK - I think the need for this kind of thing tends to arise in long-form scientific writing more than in smaller, casual documents, so having a Googleable solution like this is perhaps OK. This also remains compatible with the various ASTs out there.

jgm commented 11 months ago

One could use a single dot on a line as a "connector" that says: the following normally-block-level thing is to be considered as part of the current paragraph. Then your A is

Some text
.
- A
- B
.
more text

and your B is

Some text

- A
- B
.
more text

and so on. Of course, this would require figuring out an AST model that actually permits this sort of thing. And some (most?) output formats just won't allow a list or a block quote to be part of a paragraph: in HTML for example, a p element can only contain "phrasing content."

mygithubdevaccount commented 7 months ago

AsciiDoc uses the plus sign (+) as the so called list continuation: https://docs.asciidoctor.org/asciidoc/latest/syntax-quick-reference/#ex-complex