Open matklad opened 1 year ago
True, that might be a reason. On the other hand, the notion of an esc
is really specific to the source representation, and doesn't make much sense in an AST.
Re your idea for metadata: why not use symbols?
- :author: John M
- :title: The Book
Then you don't have to worry about splitting strings at all.
Yeah, that would work. Though, one problem with symbols is that they restrict name of the keys. For example, if I want to do hierarchical keys, I can do
: author.first_name = Alex
but I can't
- :author.first_name: Alex
Not sure how big of a problem it is. On the one hand, seems mostly like a non-issue to me. On the other hand, formats like TOML usually specifically add fascilities to write keys-which-are-not-valid-idents.
Another case would be :автор:
, which isn't a symbol, but maybe wants to be a metadata key?
and doesn't make much sense in an AST.
I am in a state of mind where I can see it both ways. On the one hand, yes, the very purpose of escape is to not be in the AST.
On the other hand, djot is extensible, and we already have some embedding of languages into djot: verbatim blocks embedd programming languages, math embedds latex. With filters and formalized AST, we can actually generalize this idea and have some fragments which are djot, but also have some extra meaning. Eg, a filter that adds a new pair of emphasis characters or something. For this use-cases, preserving escape feels useful.
Although I am not sure we actually should support such non-verbatim emeddings -- the whole idea behind djot is that you don't need to invent custom syntaxes, because spans with attributes should be enough for anything...
I'm not sure how important it is to support this kind of kludge.
Still, I'm not sure. I see the value of the esc
idea and I'm on the fence.
I think if we add escapes to AST, the principled generalization of that would be to require that AST is lossless (ie, require it to be a concrete syntax tree).
And, if I view it that way, it seems better to keep AST abstract and rely on matches for concrete stuff.
Zulip’s stream/topic link and user/group mention syntaxes are examples of custom markup features that would ideally be implemented as AST postprocessors respecting this str
/esc
distinction. (We might change the exact syntaxes a bit if we migrate to Djot, but I’m not sure if we’d want to change it drastically enough to comport with the span attribute syntax?)
The “completely erasing escapes” option would complicate what I imagine will be a common use case for the Djot AST. If you’re building a Djot editor with source and preview panes, you want the AST augmented with locations that help you map mouse clicks in the preview pane back to positions in the source pane, and that mapping needs an adjustment for every skipped backslash.
Well, actually, you could probably infer the presence of a backslash from the source locations we already have in the AST!
% djot -t astpretty -p
escaped\"quote
doc
para (1:1:0-2:0:14)
str (1:1:0-1:7:6) text="escaped"
str (1:9:8-1:14:13) text="\"quote"
Note the gap at 1:8, which could only be caused by an escape.
We might change the exact syntaxes a bit if we migrate to Djot
Wait, my favorite chat software is considering to migrate to my favorite light markup language? Lovely! :-)
There are some edge cases where you can’t tell at present.
$ ./djot -t astpretty -p
x{}\@
doc
para (1:1:0-2:0:5)
str (1:1:0-1:1:0) text="x"
str (1:5:4-1:5:4) text="@"
$ ./djot -t astpretty -p
x{ }@
doc
para (1:1:0-2:0:5)
str (1:1:0-1:1:0) text="x"
str (1:5:4-1:5:4) text="@"
Since source location and escape info are both meta-source info, maybe they belong together, along with any other such info that could be added down the road. For example, the *
below explicitly indicates the presence of an escape char just prior to that range:
doc
para (1:1:0-2:0:14)
str (1:1:0-1:7:6) text="escaped"
str (*1:9:8-1:14:13) text="\"quote"
The downside may also be upside: disabling the emission of source location would also disable emission of escape info. The upside argument would be that a client either is source dependent or it is not.
And variation of @matklad's idea which keeps esc info separate:
str (source loca) text="key "
esc (source loca)
str (source loca) text="= value"
Wait, my favorite chat software is considering to migrate to my favorite light markup language? Lovely! :-)
Yeah, we’ve been seriously looking at it. The main issues that came up in our evaluation are:
<>
)I'm definitely tempted to include an esc
element in the AST, even though it's somewhat against the spirit of an AST. Arguably, though, so is distinguishing between spaces and softbreak.
The following djot
Gives this AST
I think it should either be
or something like
First option (completely erasing escapes) seems more natural, but I think I have an argument for the second option.
As djot is extensible, certain filters might overlay extra semantics on top of djot syntax. For example, if I really don't want to add an extra line between term and definition in
:
list, I might write a custom filter which splits the term on=
. There would be a corner case -- what if the term itself contains an=
? A natural solution would be to escape it:But for that to work, the
\
needs to be preserved in the AST.