jsinger67 / parol

LL(k) and LALR(1) parser generator for Rust
https://jsinger67.github.io/
Apache License 2.0
183 stars 18 forks source link

How to have an optional first line? #320

Closed prokie closed 6 months ago

prokie commented 6 months ago

Hi!

I am new to parsing and trying to learn by writing a spice netlist parser. In the first line of a spice netlist, there is an optional title string that gives the title to the circuit. How would I write that in my parole grammar file?

This is what I have so far.

%start SpiceParser
%title "SpiceParser grammar"
%comment "Empty grammar generated by `parol`"
%line_comment "//"
%auto_newline_off

%%
SpiceParser
    : { Resistor | Capacitor | Inductor | VoltageSource | CurrentSource } END
    ;

Resistor
    : "R"^ Identifier Identifier Identifier Identifier "\n"
    ;

Capacitor
    : "C"^ Identifier Identifier Identifier Identifier "\n"
    ;

Inductor
    : "L"^ Identifier Identifier Identifier Identifier "\n"
    ;

VoltageSource
    : "V"^ Identifier Identifier Identifier Identifier Identifier "\n"
    ;

CurrentSource
    : "I"^ Identifier Identifier Identifier Identifier "\n"
    ;

END : ".END"
    | ".END" "\n"^
    ;

Identifier
    : /[a-zA-Z0-9_]+/
    ;
jsinger67 commented 6 months ago

Hi @prokie, It's great to see you're using parol for learning.

To fill me in, could you please give me an example of the beginning of an input, your new parser should be able to parse? This could help me to understand the situation.

jsinger67 commented 6 months ago

Ok, I did some Internet research. When I get it right the optional titles are comments, aren't they? If so you can use the %line_comment directive to define the line comment's start.

%line_comment '*'

prokie commented 6 months ago

Oh, I actually messed up. The first line in a spice netlist is always the title. So I somehow need to have parol match the first line of the file to the Title of the spice circuit.

Example netlist
v1 1 0 dc 15
r1 1 0 2.2k
r2 1 2 3.3k     
r3 2 0 150
.end
jsinger67 commented 6 months ago

Understood. I would define a non-terminal Title that stands before the repetitions in your start symbol.

Just a shot:

Title: /[^\n\r]+/ Newline;

Keep one thing in mind: You switched auto newline off thus you have to handle newlines in your grammar by your own.

jsinger67 commented 6 months ago

Here is a starting point. Have fun 🚀

%start SpiceParser
%title "SpiceParser grammar"
%comment "Empty grammar generated by `parol`"
%line_comment "\*"
%auto_newline_off

%%

SpiceParser
    : Title { Element } End
    ;

Title
    : /[^\n\r]+/ Newline^
    ;

Element
    : Resistor
    | Capacitor
    | Inductor
    | VoltageSource
    | CurrentSource
    ;

Resistor
    : 'R'^ Identifier Identifier Identifier Identifier Newline^
    ;

Capacitor
    : 'C'^ Identifier Identifier Identifier Identifier Newline^
    ;

Inductor
    : 'L'^ Identifier Identifier Identifier Identifier Newline^
    ;

VoltageSource
    : 'V'^ Identifier Identifier Identifier Identifier Identifier Newline^
    ;

CurrentSource
    : 'I'^ Identifier Identifier Identifier Identifier Newline^
    ;

End : /(?i)\.END/ [ Newline^ ]
    ;

Identifier
    : /[a-zA-Z0-9_]+/
    ;

Newline
    : /[\n\r]+/
    ;
prokie commented 6 months ago

Thanks for the starting point, I am still struggling with what I currently have, but the starting point is great for reference. I will continue to work on it. I will let you know if I have any questions. Thanks again.

jsinger67 commented 6 months ago

I think you have to modify a few things to make this parser work. First you should consider to ignore the case for the r, c, l, v, i just like I did it for the non-terminal End. If the syntax for these elements (r, c, l, v, i) specifies that a number follows, you should consider to include this number into the regex as well (e.g. /(?i)r\d+/) otherwise you'll struggle with whitespaces too. Unfortunately I'm no expert of spice netlists.

prokie commented 6 months ago

Hi, again. I made my problem smaller to try and get a better starting point. But I dont really know how to get around the following issue.

I am only using resistor now and just skipped the nodes and identifiers, I guess, somehow I need to tell to parser to not look for another title after finding the first one.

%start Spice
%title "Spice grammar"
%comment "Empty grammar generated by `parol`"
%line_comment "\*"

%%

Spice
    : Title { Resistor } End
    ;

Title
    : "[a-zA-Z0-9]+"
    ;

End : ".END"
    ;

Resistor
    : ResistorIdentifier
    ;

ResistorIdentifier
    : "R[a-zA-Z0-9]+"
    ;
Blaa
R1
.END

This gives me the error that it expected R1 to be a title.

jsinger67 commented 6 months ago

Yes, I understand. The problem is a token conflict. Title eats up the ResistorIdentifier. One solution could be to move it behind the Identifier. The other one is more complicated and involves using Scanner States.

jsinger67 commented 6 months ago

Hi @prokie

here is a grammar that worked for me with your first example. I hope this can help you

%start SpiceParser
%title "SpiceParser grammar"
%comment "Empty grammar generated by `parol`"
%line_comment "\*"
%auto_newline_off

%scanner TitleScanner {
    %line_comment "\*"
    %auto_newline_off
}

%%

SpiceParser
    : Title { Element } End
    ;

Title
    : [ Newline^ ] %push(TitleScanner) NonNewline Newline^ %pop()
    ;

Element
    : Resistor
    | Capacitor
    | Inductor
    | VoltageSource
    | CurrentSource
    ;

Resistor
    : RElem^ Identifier Identifier Identifier Newline^
    ;

Capacitor
    : CElem^ Identifier Identifier Identifier Newline^
    ;

Inductor
    : LElem^ Identifier Identifier Identifier Newline^
    ;

VoltageSource
    : VElem^ Identifier Identifier Identifier Identifier Newline^
    ;

CurrentSource
    : IElem^ Identifier Identifier Identifier Newline^
    ;

End : /(?i)\.END(?-i)/ [ Newline^ ]
    ;

RElem
    : /(?i)R\d+(?-i)/
    ;

CElem
    : /(?i)C\d+(?-i)/
    ;

LElem
    : /(?i)L\d+(?-i)/
    ;

VElem
    : /(?i)V\d+(?-i)/
    ;

IElem
    : /(?i)I\d+(?-i)/
    ;

Identifier
    : /[a-zA-Z0-9_\.]+/
    ;

Newline
    : <INITIAL, TitleScanner>/[\n\r]+/
    ;

NonNewline
    : <TitleScanner>/[^\n\r]+/
    ;

Keep in mind that you have to extract the first identifier (which comes directly after the r, c, l, v, i part) in your grammar processing from the token's text itself. Let me know when you need further assistance.

jsinger67 commented 6 months ago

I close the issue. If you need further help please let me know.