Closed mikix closed 5 months ago
Would taking out the year column (or converting it to decade) help with cubing?
Worth testing - and/or dropping status
Or just cutting the list of mandatory fields in two... because honestly the date field is still probably useful - like field X being present 100% of the time in the past year is useful to know, even if is only present 20% overall.
So we could just cut profiles with a lot of mandatory fields into tables like: mandatory1, mandatory2, must_support
We currently combine all the mandatory checks into one boolean field:
valid_mandatory
.The qualifier metric definition encourages stratifying by all mandatory and must support fields.
This is helpful for folks who want to answer the question "which specific fields can I rely on?"
Implementation considerations
For CUBE performance reasons, we should split those into two tables: mandatory and must-support.
I've got some WIP code for that in a branch.
But I ran into performance issues running against BCH's Cerner database. The Observation Lab profile with its seven mandatory checks (including the
status
and overallvalid
fields) combined with the huge number of Observations was just too much. After 50 minutes, it bails with "query exhausted resources at this scale factor".So further slicing and dicing would be necessary (for CUBE output at least). I tried to make an attempt at that in the above branch, by delineating between "fields" and "checks" (constraints like X or Y must be present) and then only using "fields" for this metric.
But even that gets a little fuzzy - take Immunization's must support requirement for a status reason:
a statusReason if the vaccine wasn’t given
- the current profile code allows a null status, a non-not-done
status, or a validstatusReason
field. That's a whole package check combining "is the field present" with "does the field need to be present" -- and how much of that should we put in this metric?