Regexp parsing breaks duration constraints

pieterbos commented 8 years ago

        duration_attr20 matches {Pw/PT0S}
        duration_attr21 matches {Pmw/PT0S}
        duration_attr22 matches {PWD/PT0S}

Matches

/PT0S}
        duration_attr21 matches {Pmw/

as a regexp and breaks parsing with a MisMatchedInputException later on.

Possible solution: the following lexer rule:

CONTAINED_REGEXP: '{'WS* (SLASH_REGEXP | CARET_REGEXP) WS* (';' WS* STRING)? WS* '}';
fragment SLASH_REGEXP: '/' SLASH_REGEXP_CHAR+ '/';
fragment SLASH_REGEXP_CHAR: ~[/\n\r] | ESCAPE_SEQ | '\\/';

fragment CARET_REGEXP: '^' CARET_REGEXP_CHAR+ '^';
fragment CARET_REGEXP_CHAR: ~[^\n\r] | ESCAPE_SEQ | '\\^';

pieterbos commented 8 years ago

Adding the CONTAINED_REGEXP lexer rule used to help in the previous version, now it prints:

line 35:24 no viable alternative at input '/this|that|something else/'
line 36:24 no viable alternative at input '/cardio.*/'
line 124:29 mismatched input '/PT0S}\n\t\tduration_attr21 matches {Pmw/' expecting '}'

Tried switching Archie to the new version of the grammar. Found these bugs. For now switching back to the old version - it works much better for me at the moment.

wolandscat commented 8 years ago

@pieterbos just catching up on this issue. Have you checked if the current version of the grammar in this repo works? What form of regex grammar are you currently using? @BertVerhees what grammar are you currently using now for regex? You proposed a fix here - are you using it?

The main question I need to resolve is whether the grammar can work by treating a regex as a lexer entity or a parseable entity. Currently it is a lexer entity:

// ---------- Delimited Regex matcher ------------
// allows for '/' or '^' delimiters
REGEX: '/' ( '\\/' | ~'/' )+ '/' | '^' ( '\\^' | ~'^' )+ '^';

Bert has indicated it catches the single '/' in the {PWD/PT0S} duration examples, but I can't see how this can be so, since the REGEX lexer expression above cannot match a single '/'... OTOH I am never sure how Antlr is really working, so may be wrong on my understanding.

pieterbos commented 8 years ago

The current grammar does not work. I currently use a lexer rule that includes the {}-characters

//a regexp can only exist between {}. It can optionally have an assumed value, by adding ;"value"
CONTAINED_REGEXP: '{'WS* (SLASH_REGEXP | CARET_REGEXP) WS* (';' WS* STRING)? WS* '}';
fragment SLASH_REGEXP: '/' SLASH_REGEXP_CHAR+ '/';
fragment SLASH_REGEXP_CHAR: ~[/\n\r] | ESCAPE_SEQ | '\\/';

fragment CARET_REGEXP: '^' CARET_REGEXP_CHAR+ '^';
fragment CARET_REGEXP_CHAR: ~[^\n\r] | ESCAPE_SEQ | '\\^';

Your lexer rule does not catch a single / in {PWD/PT0S}. however, if you have the following form:

attribute1 matches {PWD/PT0S} 
attribute2 matches {PWD/PT0S}

It often matches situations like that. /PT0S}\n attribute2 matches {PWD/ is a REGEX, according to your grammar. Also, it has problems with paths.

I tried solving it with parser rules, and could not find a way do it. I tried solving it with different lexer modes, and that might be possible but it was very complicated to determine when to switch back and forth between the two lexer modes.

I then used my current solution, which just works very well and is very simple, although it requires a very simple parser in code.

You could also solve it by adding a bit of java code in the antlr-grammar. You can quite easily do a negative lookahead in the parser rule and only match it if the previous token excluding whitespace was a '{', and the token after the current expression is a '}'. A similar trick probably works in the lexer rules.

You could probably also write a bit of code in the lexer rule that emits different lexer tokens, based on the specific parts. All these additions make the antlr-grammar output-language specific.

wolandscat commented 8 years ago

Doesn't your solution require a change to the c_attribute rule to enable the CONTAINED_REGEXP to plug in as a kind of c_object? The rule is:

c_attribute: adl_dir? rm_attribute_id ( c_existence | c_cardinality | c_existence c_cardinality )
    | adl_dir? rm_attribute_id c_existence? c_cardinality? SYM_MATCHES '{' c_objects '}'
    ;

I would have thought it had to become something like :

c_attribute: adl_dir? rm_attribute_id ( c_existence | c_cardinality | c_existence c_cardinality )
    | adl_dir? rm_attribute_id c_existence? c_cardinality? SYM_MATCHES ( '{' c_objects '}' | c_string_regex_block )
    ;

c_string_regex_block: CONTAINED_REGEXP ;

I would prefer to stay away from adding Java code to the Antlr spec - that makes using Antlr kind of pointless!

pieterbos commented 8 years ago

It certainly does require that and I did use something like that. I think in one or two more places in rules/archetype slot inclusion/exclusion.

I do not think it's a great solution, it's just that it works in all cases and is simple. All my attempts to find a more elegant solution so far did not work.

wolandscat commented 8 years ago

Indeed it is ugly, but I am coming around to the same conclusion.

ghost commented 8 years ago

I must look in it again, I come back to this on Monday

Bert

On 10-06-16 13:08, Thomas Beale wrote:

@pieterbos https://github.com/pieterbos just catching up on this issue. Have you checked if the current version of the grammar in this repo works? What form of regex grammar are you currently using? @BertVerhees https://github.com/BertVerhees what grammar are you currently using now for regex? You proposed a fix here https://openehr.atlassian.net/browse/SPECPR-181 - are you using it?

The main question I need to resolve is whether the grammar can work by treating a regex as a lexer entity or a parseable entity. Currently it is a lexer entity:

|// ---------- Delimited Regex matcher ------------ // allows for '/' or '^' delimiters REGEX: '/' ( '\/' | ~'/' )+ '/' | '^' ( '\^' | ~'^' )+ '^'; |

Bert has indicated it catches the single '/' in the |{PWD/PT0S}| duration examples, but I can't see how this can be so, since the REGEX lexer expression above cannot match a single '/'... OTOH I am never sure how Antlr is really working, so may be wrong on my understanding.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openEHR/adl-antlr/issues/27#issuecomment-225150141, or mute the thread https://github.com/notifications/unsubscribe/AAQnW1cNth7LTumiBlPlOnvVS3DLsMFrks5qKUWXgaJpZM4H8lVo.

ghost commented 8 years ago

My solution works for all these constructs (fully tested), I hope this helps. There are two main groups of test-constructs, one without assumed_value and one with assumed value.

Please check if if your problem-pattern is in it, if not, I am very interested to know what it is. (the grammar code snippet handling this is in the end, of course you need to know the Java code that uses the grammar-generated code, so that is in the end too, if you want to see the AOM code too, please look in my github-project: https://github.com/BertVerhees/archetyped_kernel)

    duration_attr1 matches {Pw}
    duration_attr2 matches {Pmw}
    duration_attr3 matches {PWD}
    duration_attr4 matches {PD}
    duration_attr5 matches {Pym}
    duration_attr6 matches {PdThms}
    duration_attr7 matches {PTs}
    duration_attr8 matches {PThm}
    duration_attr9 matches {PT0S}
    duration_attr11 matches {P1D}
    duration_attr12 matches {|P38W..P39W4D|}
    duration_attr13 matches {|>P38W..P39W4D|}
    duration_attr14 matches {|P38W..<P39W4D|}
    duration_attr15 matches {|>P38W..<P39W4D|}
    duration_attr16 matches {PT2H5M}
    duration_attr17 matches {PT1H55M}
    duration_attr18 matches {|<=PT1H|}
    duration_attr19 matches {PT1H30M}
    duration_attr20 matches {Pw/PT0S}
    duration_attr21 matches {Pmw/PT0S}
    duration_attr22 matches {PWD/PT0S}
    duration_attr23 matches {PD/PT0S}
    duration_attr24 matches {Pym/PT0S}
    duration_attr25 matches {PdThms/PT0S}
    duration_attr26 matches {PTs/PT0S}
    duration_attr27 matches {PThm/PT0S}
    duration_attr28 matches {Pw/|P38W..P39W4D|}
    duration_attr29 matches {Pmw/|P38W..P39W4D|}
    duration_attr30 matches {PWD/|P38W..P39W4D|}
    duration_attr31 matches {PD/|P38W..P39W4D|}
    duration_attr32 matches {Pym/|P38W..P39W4D|}
    duration_attr33 matches {PdThms/|P38W..P39W4D|}
    duration_attr34 matches {PTs/|P38W..P39W4D|}
    duration_attr35 matches {PThm/|P38W..P39W4D|}
    duration_attr36 matches {|>=PT0S|}
    duration_attr1_assumed matches {Pw;P2Y2M23DT23H15M12.3S}
    duration_attr2_assumed matches {Pmw;P23DT23H}
    duration_attr3_assumed matches {PWD;P23DT23H}
    duration_attr4_assumed matches {PD;P23DT23H}
    duration_attr5_assumed matches {Pym;P23DT23H}
    duration_attr6_assumed matches {PdThms;P23DT23H}
    duration_attr7_assumed matches {PTs;P23DT23H}
    duration_attr8_assumed matches {PThm;P23DT23H}
    duration_attr9_assumed matches {PT0S;P23DT23H}
    duration_attr10_assumed matches {PT0S;P23DT23H}
    duration_attr11_assumed matches {P1D;P23DT23H}
    duration_attr12_assumed matches {|P38W..P39W4D|;P23DT23H}
    duration_attr13_assumed matches {|>P38W..P39W4D|;P23DT23H}
    duration_attr14_assumed matches {|P38W..<P39W4D|;P23DT23H}
    duration_attr15_assumed matches {|>P38W..<P39W4D|;P23DT23H}
    duration_attr16_assumed matches {PT2H5M;P23DT23H}
    duration_attr17_assumed matches {PT1H55M;P23DT23H}
    duration_attr18_assumed matches {|<=PT1H|;P23DT23H}
    duration_attr19_assumed matches {PT1H30M;P23DT23H}
    duration_attr20_assumed matches {Pw/PT0S;P23DT23H}
    duration_attr21_assumed matches {Pmw/PT0S;P23DT23H}
    duration_attr22_assumed matches {PWD/PT0S;P23DT23H}
    duration_attr23_assumed matches {PD/PT0S;P23DT23H}
    duration_attr24_assumed matches {Pym/PT0S;P23DT23H}
    duration_attr25_assumed matches {PdThms/PT0S;P23DT23H}
    duration_attr26_assumed matches {PTs/PT0S;P23DT23H}
    duration_attr27_assumed matches {PThm/PT0S;P23DT23H}
    duration_attr28_assumed matches {Pw/|P38W..P39W4D|;P23DT23H}
    duration_attr29_assumed matches {Pmw/|P38W..P39W4D|;P23DT23H}
    duration_attr30_assumed matches {PWD/|P38W..P39W4D|;P23DT23H}
    duration_attr31_assumed matches {PD/|P38W..P39W4D|;P23DT23H}
    duration_attr32_assumed matches {Pym/|P38W..P39W4D|;P23DT23H}
    duration_attr33_assumed matches {PdThms/|P38W..P39W4D|;P23DT23H}
    duration_attr34_assumed matches {PTs/|P38W..P39W4D|;P23DT23H}
    duration_attr35_assumed matches {PThm/|P38W..P39W4D|;P23DT23H}
    duration_attr36_assumed matches {|>=PT0S|;P23DT23H}

durationIntervalValue : '|' SYM_GT? durationValue SYM_INTERVAL_SEP SYM_LT? durationValue '|' | '|' relop? durationValue '|' ;

durationValue : ISO8601_DURATION ;

durationListValue : durationValue ( ( ',' durationValue )+ | ',' SYM_LIST_CONTINUE ) ;

durationIntervalValue : '|' SYM_GT? durationValue SYM_INTERVAL_SEP SYM_LT? durationValue '|' | '|' relop? durationValue '|' ;

durationIntervalListValue : durationIntervalValue ( ( ',' durationIntervalValue )+ | ',' SYM_LIST_CONTINUE ) ;

assumedDurationValue: ';' durationValue ;

ISO8601_DURATION : 'P' (DIGIT+ [yY])? (DIGIT+ [mM])? (DIGIT+ [wW])? (DIGIT+[dD])? ('T' (DIGIT+[hH])? (DIGIT+[mM])? (DIGIT+ ('.'DIGIT+)?[sS])?)? ;

DURATION_CONSTRAINT_PATTERN : 'P' [yY]?[mM]?[Ww]?[dD]? ( 'T' [hH]?[mM]?[sS]? )? ;

SYM_LIST_CONTINUE: '...' ;

SYM_INTERVAL_SEP: '..' ;

private CDuration cDuration(AdlParser.CDurationContext cDurationContext) throws ADL_AOM_Exception { DvDuration assumedValue = null; List<Interval> constraint = new ArrayList<>(); try { if (cDurationContext.assumedDurationValue() != null) assumedValue = new DvDuration(cDurationContext.assumedDurationValue().durationValue().getText()); if (cDurationContext.durationValue() != null) { constraint.add(new PointInterval<>(cDurationContext.durationValue().getText(), DvDuration.class)); } else if (cDurationContext.durationListValue() != null) { for (AdlParser.DurationValueContext iv : cDurationContext.durationListValue().durationValue()) { constraint.add(new PointInterval<>(iv.getText(), DvDuration.class)); } } else if (cDurationContext.durationIntervalValue() != null) { try { constraint.add(new ProperInterval<>().fromString(cDurationContext.durationIntervalValue().getText(), DvDuration.class)); } catch (Exception e) { throw new ADL_AOM_Exception(String.format(ADL_AOM_Error.ILLEGAL_VALUE_CONSTRAINT_INTERVAL.getErrorString(), cDurationContext.durationIntervalValue().getText()), ADL_AOM_Error.ILLEGAL_VALUE_CONSTRAINT_INTERVAL, e); } } else if (cDurationContext.durationIntervalListValue() != null) { try { for (AdlParser.DurationIntervalValueContext iiv : cDurationContext.durationIntervalListValue().durationIntervalValue()) { constraint.add(new ProperInterval<>().fromString(iiv.getText(), DvDuration.class)); } } catch (Exception e) { throw new ADL_AOM_Exception(String.format(ADL_AOM_Error.ILLEGAL_VALUE_CONSTRAINT_INTERVAL.getErrorString(), cDurationContext.durationIntervalValue().getText()), ADL_AOM_Error.ILLEGAL_VALUE_CONSTRAINT_INTERVAL, e); } } String cDurationConstraintPattern = "PnYnMnWnDnTnHnMnS"; if(cDurationContext.DURATION_CONSTRAINT_PATTERN()!=null){ cDurationConstraintPattern = cDurationContext.DURATION_CONSTRAINT_PATTERN().getText(); } return new CDuration(assumedValue, constraint, cDurationConstraintPattern, null, null); } catch (Exception e) { throw new ADL_AOM_Exception(e.getMessage(), e); } }

ghost commented 8 years ago

Sorry for the last block of code, it doesn't come out more decent. please copy it to a good editor, and it will probably be much more readable.

ghost commented 8 years ago

Maybe the difference is in the things I don't have (and apparently don need to have, but I can be wrong, although everything seems to work)

IMHO the grammar divided in modules is harder to read because all the time I don't know where to find things. I am using IntelliJ with an ANTLR plugin to maintain my grammar, and the plugin does not support jumping between grammar-files. So that is alos a reason to have all in one file. It is only 533 lines, so that is not very very big. IntelliJ also shows a clickable structure overview in the left panel.

But the contents of grammar ContainedRegex is nowhere in my grammar, I am sure about that. And it is important contents because there are most lexer-rules in it, and lexer rules are executed before parser rules, so it can happen that lexer rules, even when they are never called from parser rules, manipulate the result of the grammar-parsing.

It is therefore very important to keep your lexer-rules clean and remove not used code, because it can still be active.

Best way to handle is to use the debug-techniques which are described in the first chapters in the ANTLR4 book. It is true that debugging is not easy, and I think one can make a lot of money with a debugging ANLTR editor/plugin.

But as long as we don't have it, write very simple archetypes and debug them, and see which lexer/parser rules are involved and why. And then write numerous test-constructs, that is what I do. Takes a lot of time, but the grammar is the core of everything. Do whatever it needs to feel pretty sure.

wolandscat commented 8 years ago

@BertVerhees is your current operational grammar here? Or is that just a copy of the one here (from some point in time)?

If I understand correctly, your grammar doesn't have the problem reported here, but you don't use Pieter Bos' fix. How are you currently dealing with regex in your operational grammar?

ghost commented 8 years ago

On 13-06-16 13:57, Thomas Beale wrote:

@BertVerhees https://github.com/BertVerhees is your current operational grammar here https://github.com/BertVerhees/archetyped_kernel/tree/master/core/adl/src/main/grammar/openehr? Or is that just a copy of the one here (from some point in time)?

If I understand correctly, your grammar doesn't have the problem reported here, but you don't use Pieter Bos' fix. How are you currently dealing with regex in your operational grammar?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openEHR/adl-antlr/issues/27#issuecomment-225560718, or mute the thread https://github.com/notifications/unsubscribe/AAQnW0iXcjLujP0u_uaP66E1tcdOEX5Dks5qLUWdgaJpZM4H8lVo.

Hi Thomas,

No, this is my grammar https://github.com/BertVerhees/archetyped_kernel/blob/master/core/adl/src/main/grammar/nl/rosa/archetype/Adl.g4

I don have a problem with the regex which I call from the c_duration parser.

I wrote in one previous message (the long one) today, how I deal with the regex. It is very much like you originally write it. All the constructs in the examples I wrote in the same message are supported.

Also many other regex-handling in the primitives is well supported and very thoroughly tested in all facets with hundreds of constructs.

If Pieter has a problem in a specific construct, I think he should send it , and I will be glad to test it too. It is hardly any trouble to do that.

With this archetype I test the primitives https://github.com/BertVerhees/archetyped_kernel/blob/master/core/adl/src/test/testResources/archetype/basic/openehr-TEST_PKG-WHOLE.primitive_types.v1.0.0.adls

They all run fine.

Bert

pieterbos commented 8 years ago

@BertVerhees , Regexp handling in primitives might work well after your duration fix. However, there are more problems with regular expression parsing, and i wonder if you solved them. They include the paths after differential path constraints and possibly the use_node syntax.

I'll try to make and send some examples tomorrow.

Also you added '.*' in a parser rule. I wonder what that does in the generated lexer rules...

ghost commented 8 years ago

Hi Pieter, you are right, I only know they work for the things I have tested. Going further I may come to other situations. Maybe it is possible to have more grammars for specific situations or handle some issues after parsing in software. I don't know yet. When I know I let you know.

Best regards Bert

Op ma 13 jun. 2016 17:53 schreef Pieter Bos notifications@github.com:

@BertVerhees https://github.com/BertVerhees , Regexp handling in primitives might work well after your duration fix. However, there are more problems with regular expression parsing, and i wonder if you solved them. They include the paths after differential path constraints and possibly the use_node syntax.

I'll try to make and send some examples tomorrow.

Also you added '.*' in a parser rule. I wonder what that does in the generated lexer rules...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openEHR/adl-antlr/issues/27#issuecomment-225624041, or mute the thread https://github.com/notifications/unsubscribe/AAQnW8BPtYqqcd0SkBYSbmUayZhlNt2oks5qLXzxgaJpZM4H8lVo .

ghost commented 8 years ago

Also you added '.*' in a parser rule. I wonder what that does in the generated lexer rules...

Good point!

To be honest, Pieter, I forgot. It is not easy to find when and why I made such tiny changes to this grammar file. But it seems at the moment no problem that it is there, and there must have been a good reason to put it there.

Maybe later, I will try to find if it still is necessary.

ghost commented 8 years ago

there are more problems with regular expression parsing, and i wonder if you solved them

I must tell you Pieter, I only work a few hours every week, max 1 hour every day, on the adl-parser/aom. No-one is paying me for that. But still there is progress, only not so fast. I am not in a hurry ;-) I do it for fun.

So there will be a time when I must handle the use-node constraints, I cannot foresee when that will be. I do it step for step. I hope it will be for the holiday season. (in August)

I did last week the constraints where the second-order was involved, and I was very happy that the parser handled them well without any change. (Well done, Thomas)

When this part is ready, the fun will really start. The ADL/AOM is the core for handling many reference-models.

ghost commented 8 years ago

Also you added '.*' in a parser rule. I wonder what that does in the generated lexer rules...

Hi Pieter, something unexpected happened, my grammar file was overwritten by some old version, that is, I think why I did not know the answer to your question.

The original is restored now.

Sorry for the confusion.

My regexpr handling is like this

regexConstraint: REGEX ; REGEX: (SLASH_REGEXP | CARET_REGEXP); SEMICOLON: ';'; fragment CARET_REGEXP: '^' CARET_REGEXP_CHAR+ '^'; fragment CARET_REGEXP_CHAR: ~[^\n\r] | ESCAPE_SEQ | '\^'; fragment SLASH_REGEXP: '/' SLASH_REGEXP_CHAR+ '/'; fragment SLASH_REGEXP_CHAR: ~[/\n\r] | ESCAPE_SEQ | '\/'; fragment ESCAPE_SEQ: '\' ['"?abfnrtv] ;

wolandscat commented 8 years ago

I'll hold off making more changes until either or both of you can test them. I suspect that if Bert's current testing doesn't include constructs with differential paths, the simpler grammar works fine. But my intuition is that once all the constructs are included, we will end up with something like Pieter's CONTAINED_REGEXP solution. In fact, in my current yacc/lex grammar in the ADL Workbench, I do something equivalent (I use look ahead to spot the leading '{/' or '{^'). But there's no instant rush.

Any testing either of you can do that helps refine this grammar I think will be of great value to the wider community, because my aim here is to make this grammar the one that actually works out of the box (or at least one possible working grammar). So I'll incorporate all fixes anyone can provide, but I'll need to rely on you for testing them to be sure they will work.

ghost commented 8 years ago

My grammar is very similar with that from Pieter, I don know how it got mixed up in git, that I was suddenly working with a wrong version. It must have happened after 7th of June, I have an idea what I did wrong. But it is corrected now.

The important difference between my grammar and Pieters regarding to regexpr is that Piet has made a lexer rule from the regexpr-constraint while mine is a parser-rule. It has to do with precedence, and how the rest of the grammar looks like. I remember having a conflict when it was a lexer-rule.

But I see that Pieter is a lot more ahead of me, so that is good, because it is open source, and it is well written code, so that is very good also.

ghost commented 8 years ago

Regarding Pieters grammar, I cannot test it, I cannot get his code to run, too much effort with gradle and so, cost me a few hours. But I saw something in his grammar which can be an error

Pieter has this: adl_rules_path : variable_reference? adl_rules_path_segment+; adl_rules_relative_path : adl_rules_path_element adl_rules_path ;
adl_rules_path_segment : ('/' | '//') adl_rules_path_element; adl_rules_path_element : attribute_id ( '[' (ID_CODE | ARCHETYPE_REF) ']' )?;

When a relative path is used, then it goes to adl_rules_path, which goes to adl_rules_path_segment which has always a slash in the start. In this construct it is impossible to parse a relative-path from one segment

Putting a question mark can eventually repair it. Like this: adl_rules_relative_path : adl_rules_path_element adl_rules_path? ;

But it is just a guess, hope I am right, however ;-)

ghost commented 8 years ago

because my aim here is to make this grammar the one that actually works out of the box

For other people to use, Pieters grammar should be better, because he has programmed it in the standard way with listeners and so on, like recommended by Terence Parr. I am using another grammar-code path, which is in my opinion efficient working towards my code-proceedings but less useful for people which have other plans. Pieters grammar is cleaner, more understandable for others. I would anyone starting a project in this recommend to use Pieters grammar.

I don't think there should be competition over this. Pieter and I face the many of the same problems so we can support each other and learn from each other.

ghost commented 8 years ago

Tip

To analyze the problem, I have the same at the moment in regards to archetypeslots, which can also have booleanexpressions and paths with slashes.

They conflict with the regexpression.

1) Lexer rules are matched first, and the reg-expression consist mainly of lexer rules.

2) So to let the path-rules also run first, we need to make lexer-rules of them.

3) To let them get out of the way of the regexpression-lexer rules, it is necessary to distinguish them uniquely. In case of the archetypeslots, we can build a lexer-rule which contains the whole string from 'exclude' have a path, have the word 'matches', have a { , a regex, until the last }

Disadvantage of this approach is that the complete lexer-rule will be returned as one string which needs to be post processed.

I am now trying this, but I must stop now, and I have no time until next week.

So maybe you think it is a good tip, that is why I write it.

This is what I do for the archetypeslot, but it still does not work, it recognizes the string as a ALPHA_LC_ID

But maybe this can be solved.

INCLUDE_LEXER: [Ii][Nn][Cc][Ll][Uu][Dd][Ee] (ALPHA_LC_ID ( '[' (ID_CODE | ARCHETYPE_REF) ']' )?) ('/' ALPHA_LC_ID ( '[' (ID_CODE | ARCHETYPE_REF) ']' )? | '//' ALPHA_LC_ID ( '[' (ID_CODE | ARCHETYPE_REF) ']' )? )? SYM_MATCHES '{' REGEX '}'; EXCLUDE_LEXER: [Ee][Xx][Cc][Ll][Uu][Dd][Ee] (ALPHA_LC_ID ( '[' (ID_CODE | ARCHETYPE_REF) ']' )?) ('/' ALPHA_LC_ID ( '[' (ID_CODE | ARCHETYPE_REF) ']' )? | '//' ALPHA_LC_ID ( '[' (ID_CODE | ARCHETYPE_REF) ']' )? )? SYM_MATCHES '{' REGEX '}';

Next week I will try to solve this, but if you solve it first, I will be very happy.

pieterbos commented 8 years ago

So, Bert and I have reached the same conclusion - we only have found ways to fix this in the lexer using a rule that matches the regular expression plus the {}-characters, or with a bit of java code in the grammar.

Bert, to answer your other (slightly unrelated) questions/remarks:

i'm not sure what a relative path means in the context of the expression language, since there is no way to switch the context of the path lookup to anything else than the root element. So I parse absolute paths only for now, plus paths starting with a variable reference. But this part of the grammar is work in progress.
I agree that testing each others grammar is quite a bit of effort. But gradle should not be the reason - just make sure jdk 8 is installed, checkout archie and type ./gradlew test and that will compile the code and run the tests - no tool installation required.

ghost commented 8 years ago

Thanks for this short instruction about gradle, I will try it next time.

Relative paths are used in archetype slots, when that feature goes, which is planned I heard, the relative paths are not needed anymore, although I could imagine some functionality for them.

wolandscat commented 8 years ago

I've pushed changes that are more or less Pieter's solution, with needed changes elsewhere to integrate them.

openEHR / adl-antlr

Regexp parsing breaks duration constraints #27