smart-on-fhir / cumulus-library-data-metrics

A data metrics study for the Cumulus project
https://docs.smarthealthit.org/cumulus/
Apache License 2.0
2 stars 0 forks source link

c_us_core_v4_count: expand all mandatory fields #40

Closed mikix closed 5 months ago

mikix commented 5 months ago

We currently combine all the mandatory checks into one boolean field: valid_mandatory.

The qualifier metric definition encourages stratifying by all mandatory and must support fields.

This is helpful for folks who want to answer the question "which specific fields can I rely on?"

Implementation considerations

For CUBE performance reasons, we should split those into two tables: mandatory and must-support.

I've got some WIP code for that in a branch.

But I ran into performance issues running against BCH's Cerner database. The Observation Lab profile with its seven mandatory checks (including the status and overall valid fields) combined with the huge number of Observations was just too much. After 50 minutes, it bails with "query exhausted resources at this scale factor".

So further slicing and dicing would be necessary (for CUBE output at least). I tried to make an attempt at that in the above branch, by delineating between "fields" and "checks" (constraints like X or Y must be present) and then only using "fields" for this metric.

But even that gets a little fuzzy - take Immunization's must support requirement for a status reason: a statusReason if the vaccine wasn’t given - the current profile code allows a null status, a non-not-done status, or a valid statusReason field. That's a whole package check combining "is the field present" with "does the field need to be present" -- and how much of that should we put in this metric?

gotdan commented 5 months ago

Would taking out the year column (or converting it to decade) help with cubing?

mikix commented 5 months ago

Worth testing - and/or dropping status

mikix commented 5 months ago

Or just cutting the list of mandatory fields in two... because honestly the date field is still probably useful - like field X being present 100% of the time in the past year is useful to know, even if is only present 20% overall.

So we could just cut profiles with a lot of mandatory fields into tables like: mandatory1, mandatory2, must_support