igordejanovic / parglare

A pure Python LR/GLR parser - http://www.igordejanovic.net/parglare/
MIT License
135 stars 32 forks source link

advice requested on implementing a language "feature" #145

Open terryheidelberg opened 1 year ago

terryheidelberg commented 1 year ago

Description

I am trying to translate an old dialect of BCPL into a more recent version. One of the quirks of this language is the allowed omission of certain language elements, in certain situations. My current attempts are to try to "inject" the missing (omitted) elements into the input stream, but that doesn't look possible with the current implementation of Parglare custom recognizers, as input_str is type str and thus immutable.
Is that correct? If so, is there another way of attacking this problem? Thanks.

Here is the relevant doc extract, edited from the old-dialect manual: Insertion of missing symbols during parse:

 (1) The symbol DO is inserted between pairs of items if they appear on the same line and 
 if the first is from the set of items which may end an expression, namely:
                )     element     ]

 and the second is from the set of items which must start a command, namely:
                TEST FOR IF UNLESS UNTIL WHILE GOTO
                RESULTIS CASE DEFAULT BREAK RETURN 
                FINISH SWITCHON   [

 (2) The compiler inserts a semicolon between adjacent items if they appear on 
 different lines and if the first is from the set of symhols which may end a command, namely:
                BREAK RETURN FINISH REPEAT
                )     element     ]

 and the second is from the set of items which may start a command,  namely:
                 TEST FOR IF UNLESS UNTIL WHILE GOTO
                 SWITCHON   (   RV   element
                 RESULTIS CASE DEFAULT BREAK RETURN
                 FINISH    [

Where in the above text means:
element : character_constant | string_constant | number | Identifier | "TRUE" | "FALSE" ;

igordejanovic commented 1 year ago

I think there are two viable approaches:

  1. Preprocessing of the input before parsing and inserting required text - this might be tricky if context-free recognition is required to check for the insertion conditions.
  2. Use custom recognizers to return "virtual tokens".

I think the second option would be "less hacky". But, the problem is that parglare currently expect a slice of the input to be returned by recognizers. In this case that slice would be empty so parglare takes that to mean unrecognized token. See here. What you can try as a workaround is to inherit the list type and make it evaluate to True in boolean context even if it is empty (search for __bool__ dunder method). Then you return an empty instance of your new list if the token is not in the input but condition for its insertion are satisfied. This will make the test in parglare to pass and token to be treated as non-empty while still the length would be 0 which is important for the parser during advancing the position.

This is just an idea, I haven't tested it.