lutaml / expressir

Ruby parser for the ISO EXPRESS language
3 stars 3 forks source link

Replace Ruby with C++ parser #33

Closed zakjan closed 3 years ago

ronaldtse commented 3 years ago

Thanks @zakjan ! Could you also help add some benchmarks on parsing to see the speed differences between the old and new parsers? Will we have two separately maintained versions or only 1 moving forward?

zakjan commented 3 years ago

Currently it parses simpler files (action_schema.exp) but still there are some edge cases where it fails with a segfault. I'm on it. I'll run the parser with all valid input files again, to be sure that everything works.

Yes, I'll add benchmarks when it's finished. For simpler files that pass already, I can already observe faster timings similar to Java/JS parsers.

However, this is only for the basic parser. I found out that current method of attaching remarks in Ruby visitor is incompatible, because tokens are not exposed from C++ parser. I have yet to explore how to implement attaching remarks with C++ parser.

ronaldtse commented 3 years ago

@zakjan can you provide build instructions or put those inside a Rakefile so we can re-generate the code in GHA?

@CAMOBAP can you help with GHA here? There is a submodule to be pulled. Thanks!

zakjan commented 3 years ago

Submodules are fetched with git submodule update --init --recursive command in GHA. Previously used https://github.com/textbook/git-checkout-submodule-action worked only on Ubuntu, so I replaced it with a simple command.

Parser is compiled with bundle exec rake compile command in GHA.

See https://github.com/lutaml/expressir/pull/33/commits/5067d010ca53b450f88a2b7179441834791b5d54 for GHA changes.

@CAMOBAP There is a note in the beginning of GHA files that they are autogenerated. Could you point me to where the changes should belong?

Compilation currently succeeds only on Ubuntu. Compilation fails on WIndows because of missing Ruby source. Installing rice dependency fails on Mac because of missing aclocal. Could you help with these issues?

@ronaldtse Parser source files are committed to the repo at https://github.com/lutaml/expressir/tree/cpp-parser/ext/express-parser. They are generated with bundle exec ./exe/generate-parser ../express-grammar/Express.g4 command. They need to be re-generated when the grammar changes.

If I got the gem configuration right, it should compile automatically when the gem is installed. I need to try it.

ronaldtse commented 3 years ago

GitHub's actions/checkout action allows fetching submodules, so there's no need for an extra command.

Not only we need compilation to succeed but we need to be able to release compiled binary gems for the platforms.

@CAMOBAP would be able to do so.

zakjan commented 3 years ago

Hmm, weird, submodule didn't fetch automatically for me, I had to add the command. Or is it a switch to turn on somewhere?

I'm not sure if Ruby gems can be cross-compiled on a single machine for all platforms, and published precompiled to Rubygems?

ronaldtse commented 3 years ago

Hmm, weird, submodule didn't fetch automatically for me, I had to add the command. Or is it a switch to turn on somewhere?

I've added it in 18bf5df.

I'm not sure if Ruby gems can be cross-compiled on a single machine for all platforms, and published precompiled to Rubygems?

Ruby gems cannot be cross-compiled but they can be compiled on the different platforms and published as gems containing the appropriate platform binaries for pre-compiled downloads.

ronaldtse commented 3 years ago

@zakjan @CAMOBAP build is working on Ubuntu, but failing due to failing specs:

  1) Expressir::ExpressExp::Formatter.format formats a file
     Failure/Error: expect(result).to eq(expected_result)

       expected: "SCHEMA remark_schema;\nCONSTANT\n  remark_constant : STRING := 'xxx';\nEND_CONSTANT;\nTYPE remark_ty...chema scope - rule where\n--\"remark_schema.remark_rule.remark_where\" universal scope - rule where"
            got: "SCHEMA remark_schema;\nCONSTANT\n  remark_constant : STRING := 'xxx';\nEND_CONSTANT;\nTYPE remark_ty...remark_query <* remark_variable | TRUE);\n  WHERE\n    remark_where : TRUE;\nEND_RULE;\nEND_SCHEMA;"

       (compared using ==)

       Diff:

       @@ -31,15 +31,11 @@
          END_LOCAL;
          ALIAS remark_alias FOR remark_variable;
            ;
       -  --"remark_alias" function alias scope - function alias
          END_ALIAS;
          REPEAT remark_repeat := 1 TO 9;
            ;
       -  --"remark_repeat" function repeat scope - function repeat
          END_REPEAT;
       -  remark_variable := QUERY(remark_query <* remark_variable | TRUE
       -  --"remark_query" function query scope - function query
       -  );
       +  remark_variable := QUERY(remark_query <* remark_variable | TRUE);
        END_FUNCTION;

        PROCEDURE remark_procedure(remark_parameter : STRING);

       @@ -53,15 +49,11 @@
          END_LOCAL;
          ALIAS remark_alias FOR remark_variable;
            ;
       -  --"remark_alias" procedure alias scope - procedure alias
          END_ALIAS;
       ---"remark_schema.remark_entity.remark_attribute" universal scope - entity attribute
       ---"remark_schema.remark_entity.remark_derived_attribute" entity scope - entity derived attribute
       ---"remark_schema.remark_entity.remark_derived_attribute" schema scope - entity derived attribute
       ---"remark_schema.remark_entity.remark_derived_attribute" universal scope - entity derived attribute
       ---"remark_schema.remark_entity.remark_inverse_attribute" entity scope - entity inverse attribute
       ---"remark_schema.remark_entity.remark_inverse_attribute" schema scope - entity inverse attribute
       ---"remark_schema.remark_entity.remark_inverse_attribute" universal scope - entity inverse attribute
       ---"remark_schema.remark_entity.remark_unique" entity scope - entity unique
       ---"remark_schema.remark_entity.remark_unique" schema scope - entity unique
       ---"remark_schema.remark_entity.remark_unique" universal scope - entity unique
       ---"remark_schema.remark_entity.remark_where" entity scope - entity where
       ---"remark_schema.remark_entity.remark_where" schema scope - entity where
       ---"remark_schema.remark_entity.remark_where" universal scope - entity where
       ---"remark_schema.remark_subtype_constraint" schema scope - subtype constraint
       ---"remark_schema.remark_subtype_constraint" universal scope - subtype constraint
       ---"remark_schema.remark_function" schema scope - function
       ---"remark_schema.remark_function" universal scope - function
       ---"remark_schema.remark_function.remark_parameter" function scope - function parameter
       ---"remark_schema.remark_function.remark_parameter" schema scope - function parameter
       ---"remark_schema.remark_function.remark_parameter" universal scope - function parameter
       ---"remark_schema.remark_function.remark_type" function scope - function type
       ---"remark_schema.remark_function.remark_type" schema scope - function type
       ---"remark_schema.remark_function.remark_type" universal scope - function type
       ---"remark_schema.remark_function.remark_enumeration_item" function scope - function enumeration item
       ---"remark_schema.remark_function.remark_enumeration_item" schema scope - function enumeration item
       ---"remark_schema.remark_function.remark_enumeration_item" universal scope - function enumeration item
       ---"remark_schema.remark_function.remark_constant" function scope - function constant
       ---"remark_schema.remark_function.remark_constant" schema scope - function constant
       ---"remark_schema.remark_function.remark_constant" universal scope - function constant
       ---"remark_schema.remark_function.remark_variable" function scope - function variable
       ---"remark_schema.remark_function.remark_variable" schema scope - function variable
       ---"remark_schema.remark_function.remark_variable" universal scope - function variable
       ---"remark_schema.remark_procedure" schema scope - procedure
       ---"remark_schema.remark_procedure" universal scope - procedure
       ---"remark_schema.remark_procedure.remark_parameter" procedure scope - procedure parameter
       ---"remark_schema.remark_procedure.remark_parameter" schema scope - procedure parameter
       ---"remark_schema.remark_procedure.remark_parameter" universal scope - procedure parameter
       ---"remark_schema.remark_procedure.remark_type" procedure scope - procedure type
       ---"remark_schema.remark_procedure.remark_type" schema scope - procedure type
       ---"remark_schema.remark_procedure.remark_type" universal scope - procedure type
       ---"remark_schema.remark_procedure.remark_enumeration_item" procedure scope - procedure enumeration item
       ---"remark_schema.remark_procedure.remark_enumeration_item" schema scope - procedure enumeration item
       ---"remark_schema.remark_procedure.remark_enumeration_item" universal scope - procedure enumeration item
       ---"remark_schema.remark_procedure.remark_constant" procedure scope - procedure constant
       ---"remark_schema.remark_procedure.remark_constant" schema scope - procedure constant
       ---"remark_schema.remark_procedure.remark_constant" universal scope - procedure constant
       ---"remark_schema.remark_procedure.remark_variable" procedure scope - procedure variable
       ---"remark_schema.remark_procedure.remark_variable" schema scope - procedure variable
       ---"remark_schema.remark_procedure.remark_variable" universal scope - procedure variable
       ---"remark_schema.remark_rule" schema scope - rule
       ---"remark_schema.remark_rule" universal scope - rule
       ---"remark_schema.remark_rule.remark_type" rule scope - rule type
       ---"remark_schema.remark_rule.remark_type" schema scope - rule type
       ---"remark_schema.remark_rule.remark_type" universal scope - rule type
       ---"remark_schema.remark_rule.remark_enumeration_item" rule scope - rule enumeration item
       ---"remark_schema.remark_rule.remark_enumeration_item" schema scope - rule enumeration item
       ---"remark_schema.remark_rule.remark_enumeration_item" universal scope - rule enumeration item
       ---"remark_schema.remark_rule.remark_constant" rule scope - rule constant
       ---"remark_schema.remark_rule.remark_constant" schema scope - rule constant
       ---"remark_schema.remark_rule.remark_constant" universal scope - rule constant
       ---"remark_schema.remark_rule.remark_variable" rule scope - rule variable
       ---"remark_schema.remark_rule.remark_variable" schema scope - rule variable
       ---"remark_schema.remark_rule.remark_variable" universal scope - rule variable
       ---"remark_schema.remark_rule.remark_where" rule scope - rule where
       ---"remark_schema.remark_rule.remark_where" schema scope - rule where
       ---"remark_schema.remark_rule.remark_where" universal scope - rule where
     # ./spec/expressir/express_exp/format_remark_spec.rb:13:in `block (3 levels) in <top (required)>'

  2) Expressir::ExpressExp::Parser.from_file build an instance from a file
     Failure/Error: expect(x.remarks).to be_instance_of(Array)
       expected nil to be an instance of Array
     # ./spec/expressir/express_exp/parse_remark_spec.rb:13:in `block (4 levels) in <top (required)>'
     # ./spec/expressir/express_exp/parse_remark_spec.rb:11:in `tap'
     # ./spec/expressir/express_exp/parse_remark_spec.rb:11:in `block (3 levels) in <top (required)>'

Finished in 43.34 seconds (files took 0.33019 seconds to load)
11 examples, 2 failures, 1 pending

Failed examples:

rspec ./spec/expressir/express_exp/format_remark_spec.rb:7 # Expressir::ExpressExp::Formatter.format formats a file
rspec ./spec/expressir/express_exp/parse_remark_spec.rb:6 # Expressir::ExpressExp::Parser.from_file build an instance from a file
ronaldtse commented 3 years ago

GHA macOS compilation also works now, the same failures in specs.

@CAMOBAP only Windows needs to be built now.

ronaldtse commented 3 years ago

We should also adopt best practice such as how the Nokogiri gem maintains their native compiled gems:

zakjan commented 3 years ago

Thanks for the Mac fix. Failed remarks tests are because the current approach of attaching remarks is incompatible yet with the new parser, I disabled it until I fix it.

zakjan commented 3 years ago

Quick measurement parsing all 1246 valid iso-10303-stepmod files took 14m46s, avg 0.7s per file.

ronaldtse commented 3 years ago

LGTM @zakjan ! Feel free to merge when ready.