openEHR / adl-antlr

Antrl4 grammars for ADL
Apache License 2.0
1 stars 4 forks source link

.NET antlr parser errors #34

Open da-baranov opened 2 years ago

da-baranov commented 2 years ago

I'm using .NET Core 6 with antlr-4.10.1-complete.jar as a parser/lexer generator and with the Antlr4.Runtime.Standard package for runtime. Got some syntax errors trying to parse the openEHR-EHR-SECTION.problem_list.v0.adl archetype (taken from CKM):

line 1:23 mismatched input '1.4' expecting VERSION_ID
line 5:2 mismatched input 'at0000' expecting AT_CODE

Here is a project to reproduce - https://github.com/da-baranov/openehr-sandbox/tree/main/openehr-antlr-test

public static void Main(string[] args)
{
            var fileContent = File.ReadAllText(@"openEHR-EHR-SECTION.problem_list.v0.adl", Encoding.UTF8);
            var inputStream = new AntlrInputStream(fileContent);
            var lexer = new Lexer(inputStream);
            var tokenStream = new CommonTokenStream(lexer);
            var parser = new Parser(tokenStream);

            var errorListener = new TestErrorListener();
            var listener = new TestListener();
            parser.AddParseListener(listener);
            parser.AddErrorListener(errorListener);
            parser.adl14_archetype();
}
pieterbos commented 2 years ago

That is an ADL1.4 archetype, and the errors indicate you are using the ADL 2 grammar. If you want to parse that file, use the ADL 1.4 grammar that is also in this project.

Alternatively you can convert it to ADL 2, for example by using the visual studio code extension.

da-baranov commented 2 years ago

Hm, I tried both ADL1.4 and ADL2, and checked twice that error messages go from the ADL14 parser.

pieterbos commented 2 years ago

Could be that this grammar has an error. Could you try with the ADL 1.4 version in https://github.com/openEHR/archie/tree/master/grammars/src/main/antlr ? That one is extensively tested, and if that does work I can check the differences.

da-baranov commented 2 years ago

Thanks, I'll try later, both with Java and .NET

da-baranov commented 2 years ago

Could be that this grammar has an error. Could you try with the ADL 1.4 version in https://github.com/openEHR/archie/tree/master/grammars/src/main/antlr ? That one is extensively tested, and if that does work I can check the differences.

Archie grammar works fine. Why not to delete this repo? On github/openehr there are at least three openEhr repositiories with the adl/adl2 grammar.

pieterbos commented 2 years ago

The cause appears to be that in an attempt to clean up this grammar, the separate lexers, 1.4 and 2, from archie, were tried to be merged into a single lexer, and that the resulting format is not well tested - at least not with ADL 1.4. The errors you found are easy enough to fix, but to have a well-tested grammar with all the features from ADL requires a good testset, probably with automated tests. Archie does provide that, this grammar repository does not. However, because of this 'cleaned up' grammar, integrating this grammar back into archie is a lot of work, and not something we at Nedap can do right now. I did in fact lightly test the ADL 2-grammar, and applied some fixes 11 months ago after it was cleaned up, but I stopped after that because we were not going to use this anyway. That means the ADL 1.4 grammar likely does not work, and even the ADL 2 one could contain errors. I see some very obvious problems with the ADL 1.4 grammar, such as VERSION_ID being referenced in the metadata, which should allow values such as '1.4', while VERSION_ID is defined as a full semantic version, with at least major minor and patch version. The other way around, ARCHETYPE_HRID in ADL 2, is now not strict enough: it accepts ADL 1.4 archetype ids, with only a major version. That is also not a good idea.

I think a good single source for the antlr-grammar is a really good idea, and that warrants the existence of this repository. However, that source should just work, and be well tested. I think that is more important than that it is the most clean grammar possible. So, my suggestion would be to use the current Archie grammars as the official ADL-antlr grammars. And to first develop a good test suite before attempting any more grammar cleanups, or at least before merging them to the main branch. That step can be done with little effort, and would mean a well functioning ANTLR grammar for ADL again.

pieterbos commented 2 years ago

Oh I see we already have the experimental version, in a separate repository, including a large test suite, at https://github.com/openEHR/openEHR-antlr4 . @wolandscat , do you know the status of that repository? Should we fix this one, or is the other one production ready already?

wolandscat commented 2 years ago

The grammars in that development repo have been lightly tested with ADL 1.4 archetypes. I will have a look tomorrow at the status. I have mainly been developing expression language and decision support language, but cleaned up the basic grammars a great deal in the process. The grammars you want are all under /combined; you will see a (simple0 Java test rig there as well. I run that under most recent IntelliJ on Linux.

wolandscat commented 2 years ago

I have done some more work on the ADL2 and ADL14 grammars to add modal lexing for embedded ODIN (C_DV_QUANTITY blocks) in CADL1.4 and also 'default' blocks within CADL2. These blocks are handed off from the Cadl readers to an Odin reader. This appears to be working properly although I need to do more deep testing.