Closed ColinTimBarndt closed 3 years ago
Embedded parser/lexer actions are written in target language, you need to translate those to Rust manually before generating grammar.
Okay, I was not aware of this ANTLR feature. What is the equivalent of the _input
variable in the Rust version?
recog.input.la(-1)
recog.input.la(-1)
does not work beacuse recog.input
is an Option<Input>
. Can I expect that the option is Some
?
Input::la
returns an isize
, but the documentation states that it returns the value of the current symbol in the stream. The byte size of an isize
varies depending on the target system and might not be able to fit a whole character depending on the target. Because of this, I can't cast the returned value to a char
in Rust.
recog.input.la(-1)
does not work beacuserecog.input
is anOption<Input>
. Can I expect that the option isSome
?
Yes, but my initial advice was not perfect, better use recog.input().la(-1)
which handles it.
might not be able to fit a whole character depending on the target
True, but do you really want to run it on 16bit targets and support full unicode codespace?
Because of this, I can't cast the returned value to a char in Rust.
If your only problem is to cast back to char
, you will have to do conversion with char::try_from().unwrap()
regardless of the type I can choose to hold current symbol, because there is dedicated EOF
symbol I have to support.
Also java parser code that you linked assumes parsing over UTF-16 code units(not really sure why). Technically, you can port it to Rust exactly like this with some manual UTF-16 transformations. But I would really recommend you to parse over full Unicode code points. That will let you parse over Unicode str
directly, and lexer will have only two | ~[\u0000-\u007F] { <check for java unicode identifier> }?
parts.
Thank you very much for your help, I finally got ANTLR working with your repository. I ran a first test with the following Java code and it completed parsing successfully with a parsing tree. The modified grammar file might be useful for some people, but I am unsure where to make it available.
abstract class TestClass extends Other {
protected final int alpha;
public TestClass(int a) {
super();
this.alpha = a;
}
}
As stated in the title, I tried to generate a parser for the official Java9 grammar in the antlr/grammars-v5 repository and it generated the following code that causes syntax and import errors:
I did not change any indentation. Apart from that, it requires some
Character
structure which is not imported and I do not know where it is defined. I looked through the generated files and it is not defined there.(char)_input.LA(-2)
seems to be a Java leftover. I think that this code originally casted a Javaint
to achar
, which does not exist in Rust.The raw syntax errors: