openEHR / adl-antlr

Antrl4 grammars for ADL
Apache License 2.0
1 stars 4 forks source link

Regarding ARCHETYPE_HRID #22

Open ghost opened 8 years ago

ghost commented 8 years ago

It is better to have items which are needed in AOM parsing in the Parser-section instead of the Lexer. This makes the AOM more safe to write and easier to maintain. The difference between the Parser and the Lexer I am referring to is that the Parser creates classes to define the grammar, these classes are easier to query.

Take a look at chapter 5.6 in the Antlr book, there is a list on page 81/82, and I refer to the last item. On page 83 is an example which explains (kind of) this issue also.

So instead of ARCHETYPE_HRID : ARCHETYPE_HRID_ROOT '.v' VERSION_ID ; ARCHETYPE_REF : ARCHETYPE_HRID_ROOT '.v' INTEGER ( '.' DIGIT+ )* ; fragment ARCHETYPE_HRID_ROOT : (NAMESPACE '::')? IDENTIFIER '-' IDENTIFIER '-' IDENTIFIER '.' LABEL ; ..................

It should become something like: archetype_hrid: archetype_hrid_root '.v' VERSION_ID ; archetype_hrid_root: (namespace '::')? IDENTIFIER '-' IDENTIFIER '-' IDENTIFIER '.' LABEL ; namespace : LABEL ('.' LABEL)+ ;

(needs to be worked out further)

ghost commented 8 years ago

I suggest this solution :

archetypeHRID: archetypeHRIDRoot '.v' releaseVersion ; archetypeHRIDRoot: (namespace '::')? referenceModelEntity '.' conceptId ;

referenceModelEntity: rmPublisher '-' rmPackage '-' rmClass ; rmPublisher: IDENTIFIER; rmPackage: IDENTIFIER; rmClass: IDENTIFIER;

conceptId: IDENTIFIER ;

releaseVersion: majorVersion '.' minorVersion '.' patchVersion '-' versionStatus ; majorVersion: DIGIT+; minorVersion: DIGIT+; patchVersion: DIGIT+; versionStatus: ( '-rc' | '-alpha' ) ( '.' buildCount+ )? ; buildCount: DIGIT+;

namespace : LABEL ('.' LABEL)+ ;

NB: This solution causes warnings: implicit definition of token IDENTIFIER in parser implicit definition of token DIGIT in parser implicit definition of token LABEL in parser

This is because these three Lexer rules are fragments. Fragments should not be referenced from a parser rule. So these must become full Lexer rules and then the warnings disappear.

ghost commented 8 years ago

I like to explain why I think my solution is better.

I have changed my grammar according to this, because, as said, it makes it easier to meet the information requirements of the AOM, which is the most important module that uses the parser.

The parser should comfort software, and take care of good performance. These objectives seem at first sight sometimes conflicting, but is that always the case?

One can consider that the performance gain from a concise fast parser/lexer-grammar is lost in the AOM, which anyway needs to split up the items to satisfy the information-requirements. As another result, there will be less code-complexity in the AOM and more clarity in the generated parser/grammar. So my solution does not really sacrifice performance, but avoids complicated code (bugs?)

ghost commented 8 years ago

I reset it to the original and found another solution by creating a ArchtypeHRID object which holds all properties