skalarsystems / fhirzeug

A Python FHIR specification parser and class generator
Apache License 2.0
17 stars 1 forks source link

Bad validation for field "duration_unit" from TimingRepeat #52

Closed Wauplin closed 4 years ago

Wauplin commented 4 years ago

According to FHIR specs, duration_unit attribute must be one of "s | min | h | d | wk | mo | a" ( https://www.hl7.org/fhir/datatypes.html#Timing ). However, in the current implementation duration_unit is encoded as a FHIRCode which is not suited for enums.

Minimal example :

import r4

repeat_data = {
    "duration_unit": "not_a_predefined_unit",
}
repeat = r4.TimingRepeat(**repeat_data) # Must raise a ValidationError

Additional information about fhir codes : https://www.hl7.org/fhir/datatypes.html#code

EDIT : The problem is not about DocEnum validation. This code has the expected behavior (raises a ValidationError).

import r4

account_data = {
    "status": "not_a_predefined_status",
}
account = r4.Account(**account_data) # Raise an error because `AccountStatus` is a `DocEnum` object.
Wauplin commented 4 years ago

I have been digging through the code to understand why durationUnit is not encoded as a Valueset but as a FHIRCode.

As far as I understood, valuesets are encoded as DocEnum which are highly constrained only if the valueset is linked to a Codesystem that is known (we have a definition in the downloaded resources) and unique (only 1 codesystem per valueset). This is a limitation of current code which works for almost all FHIR valuesets. When a valueset cannot be encoded as a DocEnum, it is encoded as a FHIRCode which is basically a string constrained by the regex [^\s]+(\s[^\s]+)* => refuse strings with duplicated whitespaces => almost all strings are validated.

Our problem with durationUnit is that it is a Valueset based on the CodeSystem from http://unitsofmeasure.org (a unified codesystem for units). We do not have the profile for this codesystem which makes the valueset ignored and replaced by a FHIRCode string.

What I propose :

Solution 1

What I propose is to define a new object between FHIRCode and DocEnum which can handle this special scenario :

=> then we create a constrained string object for this valueset. For this new object, I would rather see a constrained string than an Enum since we don't have a name, code and description for each value, only a rule to validate them.

Example :

"resourceType": "ValueSet",
"id": "duration-units",
"url": "http://hl7.org/fhir/ValueSet/duration-units",
(...)
"compose": {
    "include": [
        {
            "system": "http://unitsofmeasure.org",
            "concept": [
                {"code": "ms", "display": "milliseconds"},
                {"code": "s", "display": "seconds"},
                (...)
                {"code": "a", "display": "years"},
            ],
        }
    ]
}

=> We generate an object DurationUnits = exact_regex_constr(regex="...") to validate this ValueSet

Other example:

"resourceType": "ValueSet",
"id": "all-time-units",
"url": "http://hl7.org/fhir/ValueSet/all-time-units",
"compose": {
    "include": [
        {
            "system": "http://unitsofmeasure.org",
            "filter": [{"property": "canonical", "op": "\u003d", "value": "a"}],
        }
    ]
},

=> In this example, the valueset is defined using a filter on all possible values of http://unitsofmeasure.org. We can't create a proper validation rule for it.

Solution 2:

We parse the UCUM specification file ( http://unitsofmeasure.org/ucum-essence.xml ) with all units to create a codesystem from it and use FHIRzeug as it is. I hope it could work that way. Formal definition is only available as XML and we will need custom code to create the proper FHIRCodeSystem python object, but it seems doable. This codesystem is used by 12 different ValueSets in current FHIR implementation.

Wauplin commented 4 years ago

In my opinion, solution 2 will take more time to implement properly but is more robust.

And also, here is a nice explanation of the difference between valuesets and codesystems that helped me : https://fhir-drills.github.io/ValueSet-And-CodeSystem.html .

Wauplin commented 4 years ago

What has been decided :

Side note : https://vonk.fire.ly/r4 is well validating the duration_unit field.