jgm / djot

A light markup language
https://djot.net
MIT License
1.66k stars 43 forks source link

Considering lossless AST #108

Open matklad opened 1 year ago

matklad commented 1 year ago

Inspired by https://github.com/jgm/djot/issues/105#issuecomment-1316110154

There's such thing as a lossless syntax tree. Basically, it's a tree with the constraint that it's possible to exactly recover the input string from it, print . parse = id. We might, or might not want to use it.

Benefits of full-fidelity trees:

Benefits of AST:

matklad commented 1 year ago

In Djot, building a natural lossless tree would be a bit challenging. The "range of parent is disjoint union of children" won't work for indented constructs like lists or blockquotes

>  ```
>  fn main() {
>    "hello"
>  }
>  ```

I think we can invent some reperesentation which would capture this, but it won't be trivial.

matklad commented 1 year ago

In general, my gut feeling that our guiding principle should be this:

At the same time, implementaions can provide a lossless "stream of matches" which can be used for all purposes where AST is not enough. Matches are not part of the spec, so implementations can approach the problem differently.

chrisjsewell commented 1 year ago

At the same time, implementaions can provide a lossless "stream of matches" which can be used for all purposes where AST is not enough.

Ha yeh, I was just about to ask why you feel this would not be sufficient

matklad commented 1 year ago

cc https://github.com/jgm/djot/issues/97#issuecomment-1314459791

wooorm commented 1 year ago

This is called a CST btw. Concrete syntax tree vs Abstract syntax tree