JsonException trying to convert NCalc grammar file from ANTLR3 to ANTLR4 #260

Closed markcanary closed 1 year ago

markcanary commented 1 year ago

I am trying to convert a grammar file for the NCalc Async library from ANTLR3 to ANTLR4 so I can switch NCalc Async to use the ANTLR4 library. I am not very familiar with ANTLR so I was hoping to use this utility to do the conversion. Does anyone know how I can get this to work? Below is the grammar file. I am using the following command line:

trparse NCalc.g3 | trconvert | trprint

This produces the following exception:

System.Text.Json.JsonException: The JSON value could not be converted to AntlrJson.ParsingResultSet[]. Path: $ | LineNumber: 0 | BytePositionInLine: 20024.
   at AntlrJson.ParseTreeConverter.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\src\AntlrJson\ParseTreeConverter.cs:line 213
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo jsonTypeInfo, Nullable`1 actualByteCount)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 json, JsonTypeInfo jsonTypeInfo)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](String json, JsonSerializerOptions options)
   at Trash.Command.Execute(Config config) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\src\trprint\Command.cs:line 44
   at Trash.Program.MainInternal(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\src\trprint\Program.cs:line 72
   at Trash.Program.Main(String[] args) in C:\Users\Kenne\Documents\GitHub\Domemtech.Trash\src\trprint\Program.cs:line 16


grammar NCalc;


@header {
using System;
using System.Text;
using System.Globalization;
using NCalcAsync.Domain;

@members {
private const char BS = '\\';
private static NumberFormatInfo numberFormatInfo = new NumberFormatInfo();

private string extractString(string text) {

    StringBuilder sb = new StringBuilder(text);
    int startIndex = 1; // Skip initial quote
    int slashIndex = -1;

    while ((slashIndex = sb.ToString().IndexOf(BS, startIndex)) != -1)
        char escapeType = sb[slashIndex + 1];
        switch (escapeType)
            case 'u':
              string hcode = String.Concat(sb[slashIndex+4], sb[slashIndex+5]);
              string lcode = String.Concat(sb[slashIndex+2], sb[slashIndex+3]);
              char unicodeChar = Encoding.Unicode.GetChars(new byte[] { System.Convert.ToByte(hcode, 16), System.Convert.ToByte(lcode, 16)} )[0];
              sb.Remove(slashIndex, 6).Insert(slashIndex, unicodeChar); 
            case 'n': sb.Remove(slashIndex, 2).Insert(slashIndex, '\n'); break;
            case 'r': sb.Remove(slashIndex, 2).Insert(slashIndex, '\r'); break;
            case 't': sb.Remove(slashIndex, 2).Insert(slashIndex, '\t'); break;
            case '\'': sb.Remove(slashIndex, 2).Insert(slashIndex, '\''); break;
            case '\\': sb.Remove(slashIndex, 2).Insert(slashIndex, '\\'); break;
            default: throw new RecognitionException("Unvalid escape sequence: \\" + escapeType);

        startIndex = slashIndex + 1;


    sb.Remove(0, 1);
    sb.Remove(sb.Length - 1, 1);

    return sb.ToString();

public List<string> Errors { get; private set; }

public override void DisplayRecognitionError(String[] tokenNames, RecognitionException e) {

    base.DisplayRecognitionError(tokenNames, e);

    if(Errors == null)
        Errors = new List<string>();

    String hdr = GetErrorHeader(e);
    String msg = GetErrorMessage(e, tokenNames);
    Errors.Add(msg + " at " + hdr);

public LogicalExpression GetExpression() => ncalcExpression().value;


@init {
    numberFormatInfo.NumberDecimalSeparator = ".";

@lexer::namespace { NCalcAsync }
@parser::namespace { NCalcAsync }

ncalcExpression returns [LogicalExpression value]
    : logicalExpression EOF! {$value = $logicalExpression.value; }

logicalExpression returns [LogicalExpression value]
    :   left=conditionalExpression { $value = $left.value; } ( '?' middle=conditionalExpression ':' right=conditionalExpression { $value = new TernaryExpression($left.value, $middle.value, $right.value); })? 

conditionalExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=booleanAndExpression { $value = $left.value; } (
            ('||' | OR) { type = BinaryExpressionType.Or; } 
            right=conditionalExpression { $value = new BinaryExpression(type, $value, $right.value); } 

booleanAndExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=bitwiseOrExpression { $value = $left.value; } (
            ('&&' | AND) { type = BinaryExpressionType.And; } 
            right=bitwiseOrExpression { $value = new BinaryExpression(type, $value, $right.value); } 

bitwiseOrExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=bitwiseXOrExpression { $value = $left.value; } (
            '|' { type = BinaryExpressionType.BitwiseOr; } 
            right=bitwiseOrExpression { $value = new BinaryExpression(type, $value, $right.value); } 

bitwiseXOrExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=bitwiseAndExpression { $value = $left.value; } (
            '^' { type = BinaryExpressionType.BitwiseXOr; } 
            right=bitwiseAndExpression { $value = new BinaryExpression(type, $value, $right.value); } 

bitwiseAndExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=equalityExpression { $value = $left.value; } (
            '&' { type = BinaryExpressionType.BitwiseAnd; } 
            right=equalityExpression { $value = new BinaryExpression(type, $value, $right.value); } 

equalityExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=relationalExpression { $value = $left.value; } (
            ( ('==' | '=' ) { type = BinaryExpressionType.Equal; } 
            | ('!=' | '<>' ) { type = BinaryExpressionType.NotEqual; } ) 
            right=relationalExpression { $value = new BinaryExpression(type, $value, $right.value); } 

relationalExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=shiftExpression { $value = $left.value; } (
            ( '<' { type = BinaryExpressionType.Lesser; } 
            | '<=' { type = BinaryExpressionType.LesserOrEqual; }  
            | '>' { type = BinaryExpressionType.Greater; } 
            | '>=' { type = BinaryExpressionType.GreaterOrEqual; } ) 
            right=shiftExpression { $value = new BinaryExpression(type, $value, $right.value); } 

shiftExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    : left=additiveExpression { $value = $left.value; } (
            ( '<<' { type = BinaryExpressionType.LeftShift; } 
            | '>>' { type = BinaryExpressionType.RightShift; }  )
            right=additiveExpression { $value = new BinaryExpression(type, $value, $right.value); } 

additiveExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=multiplicativeExpression { $value = $left.value; } (
            ( '+' { type = BinaryExpressionType.Plus; } 
            | '-' { type = BinaryExpressionType.Minus; } ) 
            right=multiplicativeExpression { $value = new BinaryExpression(type, $value, $right.value); } 

multiplicativeExpression returns [LogicalExpression value]
@init {
BinaryExpressionType type = BinaryExpressionType.Unknown;
    :   left=unaryExpression { $value = $left.value; } (
            ( '*' { type = BinaryExpressionType.Times; } 
            | '/' { type = BinaryExpressionType.Div; } 
            | '%' { type = BinaryExpressionType.Modulo; } ) 
            right=unaryExpression { $value = new BinaryExpression(type, $value, $right.value); } 

unaryExpression returns [LogicalExpression value]
    :   exponentialExpression { $value = $exponentialExpression.value; }
    |   ('!' | NOT) exponentialExpression { $value = new UnaryExpression(UnaryExpressionType.Not, $exponentialExpression.value); }
    |   ('~') exponentialExpression { $value = new UnaryExpression(UnaryExpressionType.BitwiseNot, $exponentialExpression.value); }
    |   '-' exponentialExpression { $value = new UnaryExpression(UnaryExpressionType.Negate, $exponentialExpression.value); }
    |   '+' exponentialExpression { $value = new UnaryExpression(UnaryExpressionType.Positive, $exponentialExpression.value); }

exponentialExpression returns [LogicalExpression value]
    :   left=primaryExpression { $value = $left.value; } (
            '**' right=unaryExpression { $value = new BinaryExpression(BinaryExpressionType.Exponentiation, $value, $right.value); }

primaryExpression returns [LogicalExpression value]
    :   '(' logicalExpression ')'   { $value = $logicalExpression.value; }
    |   expr=value      { $value = $expr.value; }
    |   identifier {$value = (LogicalExpression) $identifier.value; } (arguments {$value = new Function($identifier.value, ($arguments.value).ToArray()); })?

value returns [ValueExpression value]
    :   INTEGER     { try { $value = new ValueExpression(int.Parse($INTEGER.text)); } catch(System.OverflowException) { $value = new ValueExpression(long.Parse($INTEGER.text)); } }
    |   FLOAT       { $value = new ValueExpression(double.Parse($FLOAT.text, NumberStyles.Float, numberFormatInfo)); }
    |   STRING      { $value = new ValueExpression(extractString($STRING.text)); }
    |   DATETIME    { $value = new ValueExpression(DateTime.Parse($DATETIME.text.Substring(1, $DATETIME.text.Length-2))); }
    |   TRUE        { $value = new ValueExpression(true); }
    |   FALSE       { $value = new ValueExpression(false); }

identifier returns[Identifier value]
    :   ID { $value = new Identifier($ID.text); }
    |   NAME { $value = new Identifier($NAME.text.Substring(1, $NAME.text.Length-2)); }

expressionList returns [List<LogicalExpression> value]
@init {
List<LogicalExpression> expressions = new List<LogicalExpression>();
    :   first=logicalExpression {expressions.Add($first.value);}  ( ',' follow=logicalExpression {expressions.Add($follow.value);})* 
    { $value = expressions; }

arguments returns [List<LogicalExpression> value]
@init {
$value = new List<LogicalExpression>();
    :   '(' ( expressionList {$value = $expressionList.value;} )? ')' 

TRUE:   T R U E ;
AND:    A N D ;
OR:     O R ;
NOT:    N O T ;


    :   DIGIT+


        :   '\'' ( EscapeSequence | (options {greedy=false;} : ~('\u0000'..'\u001f' | '\\' | '\'' ) ) )* '\''

    :   '#' (options {greedy=false;} : ~('#')*) '#'

NAME    :   '[' (options {greedy=false;} : ~(']')*) ']'

    :   ('E'|'e') ('+'|'-')? DIGIT+ 

fragment LETTER
    :   'a'..'z'
    |   'A'..'Z'
    |   '_'

fragment DIGIT
    :   '0'..'9'

fragment EscapeSequence 
    :   '\\'
    |   'r' 
    |   't'
    |   '\'' 
    |   '\\'
    |   UnicodeEscape

fragment HexDigit 
    :   ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment UnicodeEscape
        :       'u' HexDigit HexDigit HexDigit HexDigit 

/* Ignore white spaces */   
WS  :  (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=Hidden;}

/* Allow case-insensitive operators by constructing them out of fragments.
 * Solution adapted from https://stackoverflow.com/a/22160240
fragment A: 'a' | 'A';
fragment B: 'b' | 'B';
fragment C: 'c' | 'C';
fragment D: 'd' | 'D';
fragment E: 'e' | 'E';
fragment F: 'f' | 'F';
fragment G: 'g' | 'G';
fragment H: 'h' | 'H';
fragment I: 'i' | 'I';
fragment J: 'j' | 'J';
fragment K: 'k' | 'K';
fragment L: 'l' | 'L';
fragment M: 'm' | 'M';
fragment N: 'n' | 'N';
fragment O: 'o' | 'O';
fragment P: 'p' | 'P';
fragment Q: 'q' | 'Q';
fragment R: 'r' | 'R';
fragment S: 's' | 'S';
fragment T: 't' | 'T';
fragment U: 'u' | 'U';
fragment V: 'v' | 'V';
fragment W: 'w' | 'W';
fragment X: 'x' | 'X';
fragment Y: 'y' | 'Y';
fragment Z: 'z' | 'Z';
kaby76 commented 1 year ago

Sorry, but the install script that I give on the instructions gives a trprint with an older version that isn't compatible with the newer version of trparse/trconvert. For now, just type trparse NCalc.g3 | trconvert | trtext. Make sure trparse, trconvert, and trtext are all using the 0.20.17 version.

trparse --version
trconvert --version
trtext --version
trprint --version

I'll update the instructions to make sure the install is correct. So, try this for now.

dotnet tool uninstall -g trparse
dotnet tool uninstall -g trtext
dotnet tool uninstall -g trconvert
dotnet tool uninstall -g trprint

dotnet tool install -g trparse --version 0.20.17
dotnet tool install -g trtext --version 0.20.17
dotnet tool install -g trconvert --version 0.20.17

Note, trconvert brings the grammar over to Antlr4 syntactically. The output of trconvert is a tree, but that tree isn't really an Antlr4 parse of the grammar, and it isn't anymore an Antlr3 parse tree because trconvert mucks around with it. To work with it further, you'll need to re-parse the saved file text as NCalc.g4 (an Antlr4 file). I think the grammar may need to be hand-patched: replace "( :" with just "(". I'm not sure if the Antlr4 grammar needs to be updated or whether I need to remove that extra colon in trconvert.

You may want to strip out all the actions if that's unnecessary for parsing. Although I have started to update trstrip, it's not working for this grammar.

markcanary commented 1 year ago

You are correct that trprint was not version 0.20.17, but the rest were.

I used trtext instead of trprint as you suggested. trparse NCalc.g3 | trconvert | trtext > NCalc.g4 This gave me an output file.

To work with it further, you'll need to re-parse the saved file text as NCalc.g4 (an Antlr4 file).

I don't know what this means. Are you saying I need to run the output file from trprint (NCalc.g4) through trparse again?

kaby76 commented 1 year ago

To work with it further, you'll need to re-parse the saved file text as NCalc.g4 (an Antlr4 file).

I don't know what this means. Are you saying I need to run the output file from trprint (NCalc.g4) through trparse again?

I'm just saying that if you wanted to edit or query the code further using a Trash tool, it would be best to parse it over as an Antlr4 grammar. trparse NCalc.g3 | trconvert | trtree outputs a modified Antlr3 parse tree. trparse NCalc.g4 | trtree, after you fix the colon problems, outputs an Antlr4 parse tree. So, if you wanted to use some scripts to work with Antlr4 grammars, it expects the tree to be really Antlr4 parse tees.

I'm still working out some nice scripts, but you can see some in my blog, http://codinggorilla.com/.

markcanary commented 1 year ago

Thanks for the help. I was able to convert the file using your tools and some hand editing.