openEHR / adl-antlr

Antrl4 grammars for ADL
Apache License 2.0
1 stars 4 forks source link

problem with matching /path/elements matches {"value"} #20

Closed pieterbos closed 8 years ago

pieterbos commented 8 years ago

the parser reports an ambiguity with the following DFA:

s0-'/'->s1 s1-ALPHA_LC_ID->s2 s2-'/'->s4 s2-SYM_MATCHES->:s3=>1 s4-ALPHA_LC_ID->:s5^=>1

alternatives 1 and 2 are ambiguous.

Then instead of outputting one attribute_def, it outputs two:

/path

and

/element matches {"value"}

pieterbos commented 8 years ago

Oh, and you probably want to distinguish the following two cases:

/path/elements matches {"value"}

and

/path
/elements matches {"value"}
wolandscat commented 8 years ago

This is presumably due to the following rule in cadl.g4:

c_attribute: adl_dir? attribute_id c_existence? c_cardinality? ( SYM_MATCHES '{' c_objects '}' )? ;

I would have thought this was preferable, since the attribute_id is one field in a C_ATTRIBUTE, and the path goes in the differential_path field.

Also the two variants you point out above should be parsed the same way. Note that elements in the path can contain segments like attr_name[id3], but the last element (i.e. the final attribute name) cannot.

I'm not clear what the problem is here yet.

pieterbos commented 8 years ago

I agree that it's a good idea that differential path goes in one field and the attribute id in the other. That's not my issue report.

Now I look at my bug report and i'm in doubt if it is a bug. But it's an ambiguity in your grammar nevertheless and i don't know how it should be handled.

I was writing a custom test case by hand, in which i added the rule '/path/elements matches {"value"}', without id value in /path. This output two c_attributes, where i expected one. And it is ambiguous, if you look at the rule you mention, it could parse both as the two c_attributes '/path' and '/element matches {"value"}' or the one c_attribute '/path/element matches {"value"}'. I have no idea which of the two it should have picked.

'/path[id1]/elements matches {"value"}' is not ambiguous, due to the [id1] after path.

I'm not sure if the '/path/elements' path without [idx] can ever occur in an actual ADL and how it should be handled. So it could be not an issue :)

wolandscat commented 8 years ago

Ok, but it's true that /xxxx/yyyy/attr_id can occur - there are paths that don't need to have [idN] predicates in them, e.g. /content in the COMPOSITION class.

But I just realised the ambiguity (wasn't reading you properly!) It's because

/xxx/yyy

on its own is almost legal in CADL, but only if it has existence or cardinality on it. So in fact the valid patterns are:

/xxx/yyy existence matches {0} -- get rid of yyy at /xxx
/xxx/yyy existence matches {1} cardinality matches {1..4} -- mandate yyy at /xxx and control number of children
/xxx/yyy cardinality matches {1..4} -- control number of child objects
/xxx/yyy matches {...} -- child constraint objects

And everywhere you see /xxx/yyy, it could just be yyy. So a bare /xxx/yyy can't be legal on its own.

I can improve the grammar to deal with this properly.

wolandscat commented 8 years ago

This change is now committed.