rindPHI / isla

The ISLa (Input Specification Language) language & solver.
https://isla.readthedocs.org
GNU General Public License v3.0
56 stars 8 forks source link

BNF parsing translates terminals that "look like" fuzzing book nonterminals into nonterminals #45

Closed rindPHI closed 1 year ago

rindPHI commented 1 year ago

Describe the bug

When parsing a BNF grammar like

<start> ::= <A>
<A> ::= "<X>"

you currently obtain the grammar

{'<start>': ['<A>'], '<A>': ['<X>']}

where "<X>" is no longer a terminal symbol. This is not consistent with the intended semantics of BNF. The problem is, of course, that in the Fuzzing Book grammar format, symbols enclosed in angular brackets have special meaning.

To Reproduce Run the following code snippet:

grammar = """
<start> ::= <A>
<A> ::= "<X>"
"""

from isla.language import parse_bnf
print(parse_bnf(grammar))

Expected behavior

The resulting grammar should be such that the only recognized word is <X>. In the actually produced grammar, we obtain a nonterminal <X> without expansion rules instead of a word.

Solution idea

Whenever the regular expression for nonterminal symbols matches in a terminal symbol, replace the < and/or > by a new nonterminal <lt> / <gt> with expansions "<" / ">".

rindPHI commented 1 year ago

Fixed in 0783fb4840dac558e9d8ff004f973096c25b9730 (v1.10.3).