jgm / djot

A light markup language
https://djot.net
MIT License
1.71k stars 43 forks source link

Inline container precedence with attributes #211

Open hellux opened 1 year ago

hellux commented 1 year ago

In e.g. djot.js:

*{a="*"}

yields

<p><span a="*">*</span></p>

while

*{a="*"

yields

<p><strong>{a=&rdquo;</strong>&rdquo;</p>

The general inline precedence rule suggests that "the first opener that gets closed takes precedence". In this case, however, the attributes have precedence even though they are closed after the * is closed.

The first goal of djot is to allow parsing without backtracking, but is this case really possible to parse this way without any backtracking? When encountering the second *, we have to consider two possible outcomes:

How to handle this specific case isn't really specified in the syntax reference. The djot.js behavior is probably more user-friendly than following the general rule, as one would not expect quoted symbols to have impact. But from an implementation point of view it seems difficult to prioritize attributes while allowing arbitrary content in quoted attribute values without using backtracking.

Not sure whether it is intentional or not, but djot.js does not seem to allow completely arbitrary content within the quotes, e.g.

*{a=[txt](url)

turns into

<p>*{a=&ldquo;[txt](url)</p>

instead of

<p>*{a=<a href="url">txt</a></p>
jgm commented 1 year ago

The rules you're quoting are said to cover "precedence for inline containers." The way I was thinking of it, that excludes things like code spans and attributes. These have precedence over inline containers. In a full spec, all of this would need to be spelled out more explicitly.

Maybe the other issue is now fixed? With the latest in main I'm getting:

% ./djot 
*{a=[txt](url)
<p>*{a=<a href="url">txt</a></p>
hellux commented 1 year ago

The rules you're quoting are said to cover "precedence for inline containers." The way I was thinking of it, that excludes things like code spans and attributes. These have precedence over inline containers. In a full spec, all of this would need to be spelled out more explicitly.

Yes, and this precedence is probably the better alternative, but it does requires backtracking to parse. You have to parse for attributes first, and if it fails you have to go back and parse everything again. But I guess it should still be parsable in linear time, it is not really possible to nest arbitrarily many attributes within each other (trying to open another quote/comment will close the previous one).

Maybe the other issue is now fixed? With the latest in main I'm getting:

% ./djot 
*{a=[txt](url)
<p>*{a=<a href="url">txt</a></p>

I think I accidentally typed the example without quotes. With quotes it still parses as text on the main branch:

% ./djot
*{a="[txt](url)
<p>*{a=&ldquo;[txt](url)</p>