jgm / djot

A light markup language
https://djot.net
MIT License
1.66k stars 43 forks source link

Parser hangs on input #104

Closed jgm closed 1 year ago

jgm commented 1 year ago

{1--} will hang the parser.

see djot/inlines.lua around l.428; in this case hyphens == 0

jgm commented 1 year ago

Found with make fuzz.

jgm commented 1 year ago

The inline parser should be able to monitor itself for lack of progress and abort instead of looping.

jgm commented 1 year ago

Also relevant: this one gives an overlapping match:

{1-}
+para 1-1
str 1-4
str 4-4
-para 5-5
jgm commented 1 year ago

I have improved things somewhat (no more hangs), but now we have the same doubled character noted above for {1-} with {1--}. The problem is quite mysterious.

Add prints as follows in djot/inline.lua:

--- a/djot/inline.lua
+++ b/djot/inline.lua
@@ -497,9 +497,7 @@ function Tokenizer:feed(spos, endpos)
     self.lastpos = endpos
   end
   pos = spos
+  print("before while loop")
   while pos <= endpos do
+    print("top of loop: pos = ", pos, "endpos = ", endpos)
     if self.attribute_tokenizer then
       local sp = pos
       local ep2 = bounded_find(subject, special, pos, endpos) or endpos
@@ -592,7 +590,6 @@ function Tokenizer:feed(spos, endpos)
         local matcher = matchers[c]
         pos = (matcher and matcher(self, pos, endpos)) or self:single_char(pos)
       end
+      print("pos adjusted to ", pos)
     end
   end
 end

Now try echo "{1-}" | ./run.sh: (I've annotated it)

% ./run.sh
{1-}
^D
before while loop

Here's it's trying to parse it as an attribute, ultimately giving up:

top of loop: pos =  1   endpos =    5
pos adjusted to     1
top of loop: pos =  1   endpos =    5
top of loop: pos =  2   endpos =    5
top of loop: pos =  4   endpos =    5
before while loop

Having given up on attribute parsing and tries a regular parse:

top of loop: pos =  1   endpos =    3
pos adjusted to     2
top of loop: pos =  2   endpos =    3

Now here's the weird part:

pos adjusted to     5
top of loop: pos =  4   endpos =    5

Notice how pos has the value 5 at the bottom of the while loop, but right at the top it has the value 4! How did that happen???? There is nothing between the bottom of the loop and the top that could affect the value of pos!

pos adjusted to     5
top of loop: pos =  5   endpos =    5
pos adjusted to     6
<p>{1-}}</p>
jgm commented 1 year ago

@tarleb do you have any Lua insights here?

jgm commented 1 year ago

OK, I see what's happening now.

jgm commented 1 year ago

I had to revert this change because it was not well thought-out. It led to losing content when attribute parsing fails after more than one 'feed'.