Ambiguity of leading whitespace that precedes sigil

subconsciousnetwork / subtext

Markup for note taking

Apache License 2.0

271 stars 20 forks source link

Ambiguity of leading whitespace that precedes sigil #46

Closed cdata closed 1 year ago

cdata commented 1 year ago

I'm currently reworking some of the Rust subtext parser's AST representation to line it up more closely to the spec. I'm trying to reason about how to treat leading whitespace. This is important in the context of https://github.com/subconsciousnetwork/subtext/issues/31; if leading whitespace has presentational significance for a list block (it doesn't yet, but it might in the future), then some amount of leading whitespace ought to be captured context for that block. This may be true for other blocks even if that leading whitespace isn't used for presentational purposes in those cases.

At any rate, right now the Rust parser considers any leading whitespace is to be part of a Blank that precedes the content. But, this actually violates the intention of the spec in its own way, I think.

Would you be willing to clarify in the spec the way such leading whitespace that precedes a sigil on a given line should be treated?

gordonbrander commented 1 year ago

Currently, the current intent of the specification is that a line either begins with a valid sigil, or is text, so:

> This is a quote block
    > This is not a quote block. This is just text.

If we were to introduce leading whitespace in future, then we would need to introduce some lookhead logic. For example, if we were to allow leading indentation for list blocks, we might do it like this:

- This is a list item
    - This is a list item
This is just text
    > This is also just text

Alternatively, we might allow any kind of block to be indented (not my first preference), in which case:

- This is a list item
    - This is a list item
This is just text
    > This is a quote

...And each block would keep a record of its indentation level.

cdata commented 1 year ago

Just observing: if such leading whitespace is parsed as a paragraph or a blank, it's probably not what the user intended. These interpretations would cause a new block to appear in the AST where there otherwise would not be one (and would lead to weird rendering of the document unless the renderer cares to discriminate by the contents of the block itself, but then that would undermine the value of the parser somewhat).

gordonbrander commented 1 year ago

One line should always equal one block, so the rules above should never cause a new block to appear in the AST where otherwise there would not be one. They would only change the type of the block. E.g. a whitespace-only line might cause a Blank block to become a Text block.

cdata commented 1 year ago

Ah, okay. Sorry, I think I misunderstood your response.

- is a list
 - is a paragraph