openEHR / specifications-BASE

openEHR general specifications and resources.
Other
9 stars 9 forks source link

Expression language: no specification on what to do on multiply valued attributes #3

Open pieterbos opened 8 years ago

pieterbos commented 8 years ago

Take the following snippet:

/path/to/value + /path/to/another/value > 3

When /path/to/value and /path/to/another/value both have a single value in the object, that's quite easy. But often these can result in multiply valued results. So what to do in the following cases:

/path/to/value: [3]
/path/to/another/value: [3,4]

/path/to/value: [3,4]
/path/to/another/value: [3,4,5]

/path/to/value: [3,4,5]
/path/to/another/value: []

One solution would be to use the xpath-defined evaluation rules in https://www.w3.org/TR/xpath/ or https://www.w3.org/TR/xpath20/ or https://www.w3.org/TR/xpath-30/ . note that xpath 2 and 3 have not reached widespread adoption. See also issue #4

wolandscat commented 8 years ago

There are probably a few solutions. One approach might be to limit expressions to only being able to use variables, where those variables already have their types declared. So instead of the first snippet above, you would have to have something like

$var1: real := /path/to/value
$var2: real := /path/to/another/value

$var1 + $var2 > 3

Since $var1 and $var2 are declared, then the expression $var1 + $var2 > 3 can be statically checked to be valid (we assume that normal Integer -> Real promotion rules apply, to enable '3' to be converted to a Real).

If we assume that the RM is available during artefact parsing & validation (it is in tools like the ADL WB and adl-designer) then statements like $var2: real := /path/to/another/value can be type-checked too. This means the whole set of statements can be compile-time checked for validity. The downside is that type declarations are required for all variables.

A less strict approach would be to still allow expressions like /path/to/value + /path/to/another/value > 3 but to statically evaluate them at compile time and flag type inconsistencies. Upside: can avoid some type declarations; downside - it's not clear what the expression author thought the types were.

A more runtime oriented approach would be to fail expression evaluation if types could not be matched, but this isn't safe, and it doesn't seem useful to me.

pieterbos commented 8 years ago

The XPath specs have a different approach:

Although the not allowing to compare and operate on multiply valued attributes approach has some benefits in preventing mistakes - I'd say there's quite a benefit to not reinventing the wheel on this!

wolandscat commented 8 years ago

I'm not sure how those Xpath rules would apply to multiple-valued attributes in the openEHR reference model. Do you have an example of an RM attribute in mind? It might make the clearer (at least for me!)

pieterbos commented 8 years ago

Of course. It's not just an RM attribute, it's everything below specific attributes that can have more than one value. For example, part of the blood pressure archetype from the clinical knowledge manager

OBSERVATION[id1] matches {  -- Blood Pressure
        data matches {
            HISTORY[id2] matches {  -- history
                events cardinality matches {1..*; unordered} matches {
                    EVENT[id7] occurrences matches {0..*} matches { -- any event
                        data matches {
                            ITEM_TREE[id4] matches {
                                items matches {
                                    ELEMENT[id5] occurrences matches {0..1} matches {   -- Systolic
                                        value matches {
                                            DV_QUANTITY[id1054] matches {
                                                property matches {[at1055]}
                                                magnitude matches {|0.0..<1000.0|}
                                                precision matches {0}
                                                units matches {"mm[Hg]"}
                                            }
                                        }
                                    }
                                    ELEMENT[id6] occurrences matches {0..1} matches {   -- Diastolic
                                        value matches {
                                            DV_QUANTITY[id1055] matches {
                                                property matches {[at1055]}
                                                magnitude matches {|0.0..<1000.0|}
                                                precision matches {0}
                                                units matches {"mm[Hg]"}
                                            }
                                        }
                                    }

Say we write:

/data[id2]/events[id7]/data[id4]/items[id5]/value[id1054]/value > /data[id2]/events[id7]/data[id4]/items[id6]/value[id1055]/value

Of course, you should write for_all $event in /data[id2]/events[id7], but nothing in the specs currently prevent anyone from writing the above assertion. However, there's no definition on what this rule means in the spec, or if it is valid or not.

Say we add two blood pressure measurements to the history, 120/80 and 150/121. That would mean when evaluating rules:

[120,150] > [80, 121]

There are lots of ways to do this. Xpath says the result should be true, because a value in the first set exist that is greater than a value in the second set, so 150 > 80, or 150 > 121, or 120 > 80.

Whatever we choose is fine, but we should choose something and I think following existing standards would be fine here.

wolandscat commented 8 years ago

Following the spec, the expression (the last value should be magnitude):

/data[id2]/events[id7]/data[id4]/items[id5]/value[id1054]/magnitude > /data[id2]/events[id7]/data[id4]/items[id6]/value[id1055]/magnitude

entails a comparison of two Reals which is fine. If we write:

for_all $event in /data[id2]/events[id7] 
    $event/data[id4]/items[id5]/value/magnitude > $event/data[id4]/items[id6]/value/magnitude

I think the meaning is clear - each time around the loop, a Real comparison is being tested. Is the Xquery interpretation of the above a comparison of Real vectors?

pieterbos commented 8 years ago

If the observation has two events, the aql queries from the paths to the magnitudes would each have two real results, not one.

Unless you define a way where you have to loop over every event and apply the rules in a loop. Then the rules simply have to be evaluated twice in my example. That's also possible, but then this would have to be clearly defined in the spec. This could also make rule evaluation more complicated to implement. Also when you have an archetype containing two observations with a rule:

''' /path/in/first/observation > /path/in/second/observation '''

What should happen?

Or:

''' /path/in/observation > /path/to/singlevalue/not/in/observation '''

It's not clear in the specs what to do. The xpath variant, where paths can evaluate to node sets is one way to do it.

wolandscat commented 8 years ago

You are right - my mistake, I was forgetting that id7 is the code of any EVENT in that archetype, so they are all matched. So it probably makes sense to state:

I think that's compatible with the Xquery rules.

pieterbos commented 8 years ago

For the relative operators: That's a bit different than the Xpath/XQuery approach, where it's true if and only if you can find a pair of values, one from each set, where the operator evaluate to true. Regardless of size. But i think it could be a good way to define it for the relative operators (<, >, !=, ==, >= and <=)

For the arithmetic operators in xpath, the types are first converted to numbers, conversion being defined for quite a lot of types including node sets, then the operator is applied. They seem to have changed their mind halfway, because in xpath version 1, only the first element of a node set is used. In xpath 2, applying an arithmetic operator to a sequence of length > 1 results in an error.

Then you can run xpath 2 in xpath 1 compatibility mode to revert this behaviour to the old way...

Because I implemented the prototype version in Archie before reading the xpath-specs and I had to choose something, I currently do:

evaluated as [3,4] + [3] = [6,7] and [3,4] + [1,2] = [4,6].

I have no idea what the best approach is.

The 'what should we do with empty value lists/node sets/whatever we call them' is also a discussion i guess. For some things, what i do is a good match. For some score calculations with ordinals, we would need to find a way to express 'use the value of this ordinal, or if it does not exist, use value X'. Where X sometimes is 0, sometimes 1 or sometimes something else. Or we could define functions that do such a thing, or a function to sum a number of nodes, or some other way.