jgm / djot

A light markup language
https://djot.net
MIT License
1.66k stars 43 forks source link

Hierarchical sections in the AST #86

Closed matklad closed 1 year ago

matklad commented 1 year ago

For something like

# top level

## section

content

### subsection

content

## another section

more content

Djot currently produces a "flat" ast:

doc
  heading level="1" id="top-level"
    str s="top level"
  heading level="2" id="section"
    str s="section"
  para
    str s="content"
  heading level="3" id="subsection"
    str s="subsection"
  para
    str s="content"
  heading level="2" id="another-section"
    str s="another section"
  para
    str s="more content"

We could produce a nested one with explicit sections:

doc
  heading id="top-level"
    str s="top level"
  sect
    heading id="section"
      str s="section"
    para
      str s="content"
    sect
      heading id="subsection"
        str s="subsection"
      para
        str s="content"
  sect
    heading id="another-section"
      str s="another section"
    para
      str s="more content"

The question is, should we?

My naive answer is yes, we should, as it makes the structure much friendlier to processing (eg, I can trivially convert that to html's section elements). It's also interesting that this essentially gets rid of the level attribute, as the level becomes implicit in the nestings structure..

But there's a nasty problem that header-based sections don't compose with explicit blocks. What should be the AST for the following code?

# Sect

## SubSect1

:::
## SubSect2
:::

## SubSect3
jgm commented 1 year ago

This is indeed a tricky problem. Pandoc handles it by having a separate function makeSections that imposes the section structure on an AST without it.

I've long struggled to get the right approach to handling nested divs like the one you display above.

One might use these divs for things like callouts that shouldn't be in the hierarchical structure. So in pandoc, we generally look for divs with the class "section." But that's a bit hacky.

I'm tempted to go for the hierachical structure right off for djot, because we don't need to worry as much about e.g. supporting someone who wants to go from h2 to h4 in their HTML output. But I'm still a bit unsure.

matklad commented 1 year ago

One might use these divs for things like callouts that shouldn't be in the hierarchical structure.

Yeah... And this might be useful for titles/captions

::: example
# How to use Djot on the web

You can compile Lua to WebAssembly ...
:::

Not sure how I feel about overloading # for "block" titles: it it's perhaps not super obvious that they are relative to the div