yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
880 stars 112 forks source link

Error reporting questions #234

Closed kfsone closed 2 years ago

kfsone commented 2 years ago

This is not a defect report, just a request for guidance.

I'm looking to add nested comments, and I'd like to be able to take the input:

/* line 1:1 is the first comment open
 /* line 2:2 is the second
  /* line 3:3 and so on
   /* line 4:4
    /* line 5:5
*/  // closes line 5:5  (this line has a newline, so the EOI is at 7:1

and report

error: 7:1: unexpected end of input while 4 block comments remain open:
note: 4:4: comment opened here
note: 3:3: comment opened here
note: 2:2: comment opened here
note: 1:1: comment opened here

or at least

error: 7:1: unterminated comment
error: 4:4: comment opened here

the line-split is important so that I can follow the FLClm (file-line-column: level: message) convention that allows IDEs to take users to source

/users/badprogrammer/terribleness/hideous.cpp:7:1: error: unexpected end of input...
/users/badprogrammer/terribleness/hideous.cpp:4:4: note: comment opened here

Sample:

program <- (~NL / expr)* ~EOI

~BLOCK_COMMENT  <- '/*' ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* ('*/'^unterminated_comment)
~LINE_COMMENT   <- '//' [^\n]*
~NOISE          <- [ \f\r\t] / BLOCK_COMMENT

EOI             <- !.
NL              <- NOISE* LINE_COMMENT? '\n'

# error recovery
unterminated_comment <- EOI { message "unterminated block comment" }

expr <- 'hello'

I looked at using parser["BLOCK_COMMENT"].enter/.leave but I didn't see a way to capture the positions; I looked at %recovery/label{message} but it only appears to be able to tell me where the unexpected symbol is.

This made me think that I might need to use a negative look ahead?

# NLAv1
~BLOCK_COMMENT  <- '/*' (!EOI^unterminated_comment ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] ))* '*/'

still gives me the EOI

# NLAv2
~BLOCK_COMMENT  <- '/*' (!EOI^unterminated_comment / ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* '*/')^unterminated_comment

an improvement: tells me the outer-most location of the unterminated comment, but I really don't like the look of the (!EOI / ...)*. Also, would this be an appropriate place for a cut?

# NLAv3
~BLOCK_COMMENT  <- '/*' ↑ (!EOI^unterminated_comment / ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* '*/')^unterminated_comment

So perhaps:

# NLAv4
~BLOCK_COMMENT <- '/*' TERMINATED_COMMENT^unterminated_comment
~TERMINATED_COMMENT <- ('/'+[^*/]+ / BLOCK_COMMENT / '*'+[^*/]+ / [^*/] )* '*/'

this gives me the outermost unterminated comment

// this is a comment
   /* this is a comment /* // and this doesn't stop it   <-- 2:6 is here
      so this is also a comment // /* as is this // */
      we're still commenting /*  <- this is also unterminated

hello

2:6 unterminated block comment

Being able to someone indicate the depth would be really helpful -- nested block comments are far more complex than I imagined because - for instance - strings and escapes aren't taken into account, line comments don't factor, or if they did...

/* printf("unterminated comment: did you forget the '*/'?"); */  # oops
vs
// printf("unterminated comment: did you forget the '/*?'"); */  # ok
/*
  // I commented this line out */
/*
  // I'm commenting this comment out /* the cake is a lie */
*/
  printf("// disallow /* in strings");  // */ not required

I'm going to go stick my head in a buck of cold water now. If you close this ticket and pretend it never happened, I'll totally understand :)

yhirose commented 2 years ago

@kfsone, I enhanced enter and leave, I am now able to support the nested blocks.

https://github.com/yhirose/cpp-peglib/blob/c9090b66615713c014476c552a93c38398ae642f/test/test2.cc#L2177-L2238