the-human-colossus-foundation / oca-spec

Overlay Capture Architecture Specification
European Union Public License 1.2
8 stars 7 forks source link

OCA Layer for optional attributes #4

Closed neiljthomson closed 2 years ago

neiljthomson commented 2 years ago

The XL7 FHIR schema model uses a technique of defining most if not all document (aka record) attributes (aka properties) as optional (0..1, 0..N).

This raises the problem of some agent querying a database which uses an attribute in a calculation or filter expression, in the situation where the actual stored document does not include the attribute. Note that this is not the same as the attribute existing in the stored document having a NULL value.

The problem is that in order to process the query, typically the attribute will have to be created (in, say a temporary form of the record) and a "default value if attribute missing" value applied to any attributes required in the query, but that do not exist in some or all of the query data set documents.

A solution to this could be a separate OCA layer, which literally can be used to define "default value if attribute missing" for any attribute. Note, if the general "default value" exists, and the default value is defined in within an OCA layer, then the "default value if attribute missing" can reference the general default value explicitly OR default to using the "default value"

Then there is the case where attributes such as "identifier" (e.g. patient.identifier) are also optional, which are really mandatory and a default value cannot be determined. If missing these may be an error condition. This suggests that an OCA layer may provide for references to external specifications, etc. for how a data processor should deal with these exceptions. A potential solution is a procedural logic (e.g., if/then/else in JSON form).

mitfik commented 2 years ago

with "missing" attributes I think is quite easy to address that by applying subset overlay based on which the person who is query specific repository can know what is available.

For example: Let's say we have a Schema A which represent patient data about his diabetes, assuming that Repository A stores this record but would like to exclude sensitive data about patient (name, identifier etc.) by applying Subset overlay A

When researcher comes to Repository A he gets information that Schema A is available but only with Subset overlay A. Which automatically gives him a clue which attributes will be missing even that he operates on Schema A.

Regarding default values, I think this is very use case specific and could be define on the researcher side since only he knows what "default" values would satisfy him/her. If it would be in a form of overlay - who decided which default values are valid? what if you have more then one overlay like that, can I have overlay on the capture side and on query side?