Closed pieterbos closed 8 years ago
So far I cannot discover a reason for this problem - the Antlr debugging tools are extremely primitive, so I may be missing something obvious.
However, I noticed that the date and date/time patterns parse correctly, i.e.
THING[id4] matches {
date_time_attr1 matches {yyyy-mm-XXThh:mm:ss}
date_attr1 matches {yyyy-mm-??}
}
This shows that the time pattern 'hh:mm:ss' is described properly by the lex rules, since the date/time constraint pattern (the first one) is specified by a rule that re-uses the time constraint pattern.
I also noticed that the time pattern parses if it commences with capital 'HH' or has a space at the end, i.e.
THING[id4] matches {
time_attr1 matches {HH:mm:ss} // pass
time_attr1 matches {hh:mm:ss } // pass
time_attr1 matches {hh:mm:ss} // FAIL
}
I cannot tell what the parser is seeing in the last line, although for some reason it is running past the 'ss' and including the '}'. The error I get is:
line 4:24 mismatched input 'hh:mm:ss}' expecting {'[', '/', '-', '^', '+', '|', DATE_CONSTRAINT_PATTERN, TIME_CONSTRAINT_PATTERN, DATE_TIME_CONSTRAINT_PATTERN, DURATION_CONSTRAINT_PATTERN, SYM_USE_NODE, SYM_USE_ARCHETYPE, SYM_ALLOW_ARCHETYPE, SYM_AFTER, SYM_BEFORE, ISO8601_DATE, ISO8601_TIME, ISO8601_DATE_TIME, ISO8601_DURATION, SYM_TRUE, SYM_FALSE, ALPHA_UC_ID, INTEGER, REAL, STRING}
t says here that it can't match TIME_CONSTRAINT_PATTERN, but that must be because the scanner has read 'hh:mm:ss}' instead of just 'hh:mm:ss'. I don't understand this behaviour.
I can't check now, but you can run the antlr runtime with the -tokens parameter, and it will print the token stream from the lexer. Then you can lookup the symbols corresponding to the numbers in the output with the adl.tokens file that is generated and see what is wrong.i think you are right about an ambiguity in the lexer.
I'll check on Monday if you haven't already.
Looking at the lexer rules - the URI rule matches the time pattern, as in, it will not match if it starts with a capital letter but will match with a lowercase 'hh' at the start.
Aha - yes that's it - I had forgotten the URI pattern was defined as more or less 'anything'. I've replaced it with a reasonable proper URI scanner rule set. It appears to correctly scan some URIs that I have tested it with and also fix the problem reported here.
Thanks for the testing - I think we are getting close to something.
I integrated your latest modifications in https://github.com/nedap/archie . Looks like it works very well, the time patterns parse, and so do the URIs.
There's one URI that did not parse, but it contains invalid characters for an URL. A webbrowser however will usually convert them to URL-encoded characters: http://air93.org/MZN-SIF-Dihanje-Kašelj in the file openEHR-EHR-OBSERVATION.value_set_binding.v1.adls
I don't think that needs fixing, although I'm not sure what the specification says about the URI syntax.
This is definately fixed, The URI looks fine as it is.
We probably need to upgrade the URI spec and grammar rules to allow UTF-8 chars, but I think we might leave this to a separate issue.
gives errors:
I haven't checked the token stream, but the ambiguity warning at token 135 could be a cause.