yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
880 stars 112 forks source link

Parser can't detect end of file. #245

Closed SamuelMartens closed 2 years ago

SamuelMartens commented 2 years ago

Hello. I've been using peglib for a year and recently decided to update it. This piece of grammar used to work, but now it gets stuck in the infinite loop somewhere, even if you try to run it on the empty file.

File <- (Instruction / Code)*

Instruction <- T( DefStart IncludeInstr)
Code        <- T( !DefStart ( !EndOfLine . )*)

# --- Instructions ---
IncludeInstr <- T( 'include' _ '"' < Word '.' Word > '"' )

# --- Basic definitions ---
DefStart    <- _ '@' _
Word        <- T( < [a-zA-Z_][a-zA-Z_]* > )

# --- Whitespaces ---
~_   <- Spacing*

Spacing   <- (Space / Comment)
Comment   <- '//' ( !EndOfLine . )* EndOfLine
Space     <- ' ' / '\t' / EndOfLine
EndOfLine <- '\r\n' / '\n' / '\r'

# --- Parser macro ---
T(x)        <- (_ x _) 

I've found this problem can be solved if I modify Code definition to this

Code <- T( !DefStart ( !EndOfLine . )* EndOfLine )

This makes me think the parser reaches the end of the file at some point, but if the current definition is not completed, it is stuck. Thank you.

yhirose commented 2 years ago

@SamuelMartens, thanks for the report. The former versions actually had a number of problems and the latest version has improvements.

I put your grammar in a.peg and run peglint a.peg --source '' --trace to see what's going on inside the parser. I found that Code <- T( !DefStart ( !EndOfLine . )*) can match an empty string and doesn't increment the current position. That's why it will end up falling into the infinite loop.

I changed Code definition to be Code <- T( !DefStart ( !EndOfLine . )+), and I confirmed it fixes the problem. Is this change acceptable?

yhirose commented 2 years ago

@SamuelMartens I am going to close it because the parser behavior in the latest cpp-peglib is actually correct. Thanks!