kaby76 / g4-scripts

MIT License
2 stars 1 forks source link

Add script to detect and clean up non-idiomatic grammars. #4

Open kaby76 opened 1 month ago

kaby76 commented 1 month ago

See https://github.com/antlr/grammars-v4/issues/4291#issuecomment-2437613843

kaby76 commented 1 month ago

Bison/Yacc does not have the equivalent of an optional element, i.e., there is no ?-operator. https://www.gnu.org/software/bison/manual/bison.html

After converting a Bison grammar to Antlr, it would be best to convert the non-idiomatic usages to idiomatic Antlr. There should be scripts to rewrite empty alternatives into rules that use the ?-operator.

So far the scripts are:

* Fix:

#

set -x

set -e set -o pipefail for dir in find . -name desc.xml | sed 's#/desc.xml##' | sort -u do echo $dir

Find rules that contain top-level empty alts.

# Note, not complete because the alt may be not empty, but could derive empty.
dotnet trparse -l $dir/*.g4 2>/dev/null > save.pt
cat save.pt | dotnet trxgrep ' //parserRuleSpec[./ruleBlock/ruleAltList/labeledAlt/alternative[count(./*) = 0]]/RULE_REF/text()' > rules.txt
# Find locations of use that an operator applied to it.
for r in `cat rules.txt`
do
    rr=`echo $r | tr -d '\n' | tr -d '\r'`
    echo Working on $rr
    dotnet trparse $dir/*.g4 2>/dev/null | dotnet trquery replace ' //parserRuleSpec/ruleBlock//element[./atom and not(./ebnfSuffix)]/atom/ruleref/RULE_REF[./text() = "'$rr'"]' "'$rr?'" | dotnet trsponge -o $dir -c
    # Remove empty alt in the rule.
    dotnet trparse $dir/*.g4 2>/dev/null | dotnet trquery delete "  //parserRuleSpec[RULE_REF/text() = '$rr']/ruleBlock/ruleAltList/labeledAlt[alternative/count(./*) = 0 and ./preceding-sibling::*[last()]/self::OR]/(. | ./preceding-sibling::OR[last()])" | dotnet trsponge -o $dir -c
    dotnet trparse $dir/*.g4 2>/dev/null | dotnet trquery delete "  //parserRuleSpec[RULE_REF/text() = '$rr']/ruleBlock/ruleAltList/labeledAlt[alternative/count(./*) = 0 and ./following-sibling::*[1]/self::OR]/(. | ./following-sibling::OR[1])" | dotnet trsponge -o $dir -c
done

done


* Rename:
kaby76 commented 1 month ago

After clean up of non-idiomatic empty productions, I wrote a trnullable app to determine if a parser rule is nullable. It works on a pure grammar AST visitor analysis, which contrasts with the ATN method used in the Antlr4 Tool (checkEpsilonClosure). This is fine because they are dual solutions to the same problem. (Note, there was a paper that noted the same observation that you don't need to construct ATNs ever.)