Closed jgm closed 1 year ago
Found with make fuzz
.
The inline parser should be able to monitor itself for lack of progress and abort instead of looping.
Also relevant: this one gives an overlapping match:
{1-}
+para 1-1
str 1-4
str 4-4
-para 5-5
I have improved things somewhat (no more hangs), but now we have the same doubled character noted above for {1-}
with {1--}
. The problem is quite mysterious.
Add prints as follows in djot/inline.lua:
--- a/djot/inline.lua
+++ b/djot/inline.lua
@@ -497,9 +497,7 @@ function Tokenizer:feed(spos, endpos)
self.lastpos = endpos
end
pos = spos
+ print("before while loop")
while pos <= endpos do
+ print("top of loop: pos = ", pos, "endpos = ", endpos)
if self.attribute_tokenizer then
local sp = pos
local ep2 = bounded_find(subject, special, pos, endpos) or endpos
@@ -592,7 +590,6 @@ function Tokenizer:feed(spos, endpos)
local matcher = matchers[c]
pos = (matcher and matcher(self, pos, endpos)) or self:single_char(pos)
end
+ print("pos adjusted to ", pos)
end
end
end
Now try echo "{1-}" | ./run.sh
: (I've annotated it)
% ./run.sh
{1-}
^D
before while loop
Here's it's trying to parse it as an attribute, ultimately giving up:
top of loop: pos = 1 endpos = 5
pos adjusted to 1
top of loop: pos = 1 endpos = 5
top of loop: pos = 2 endpos = 5
top of loop: pos = 4 endpos = 5
before while loop
Having given up on attribute parsing and tries a regular parse:
top of loop: pos = 1 endpos = 3
pos adjusted to 2
top of loop: pos = 2 endpos = 3
Now here's the weird part:
pos adjusted to 5
top of loop: pos = 4 endpos = 5
Notice how pos has the value 5 at the bottom of the while loop, but right at the top it has the value 4! How did that happen???? There is nothing between the bottom of the loop and the top that could affect the value of pos
!
pos adjusted to 5
top of loop: pos = 5 endpos = 5
pos adjusted to 6
<p>{1-}}</p>
@tarleb do you have any Lua insights here?
OK, I see what's happening now.
I had to revert this change because it was not well thought-out. It led to losing content when attribute parsing fails after more than one 'feed'.
{1--}
will hang the parser.see djot/inlines.lua around l.428; in this case hyphens == 0