smart-on-fhir / cumulus-library

https://docs.smarthealthit.org/cumulus/library/
Apache License 2.0
2 stars 0 forks source link

Support the data-metrics study having a per-study build option #308

Open mikix opened 1 week ago

mikix commented 1 week ago

Problem Statement The data-metrics study (which is a special case in a lot of ways) wants a new special status of having per-study builds.

What I mean is, we want a user to be able to ask "what are the data metrics of the cohort selected by the covid study?". This would help determine if your study cohort data is well formed / has good data quality.

Since this will require special Library support in a few places, I wanted to file a tracking issue for the various pieces of this. Maybe these should be separate issues - but I wanted to leave open the option to discuss the whole approach here too.

Manifests/Inventories

We'll probably want the Library to start writing out inventory tables (a list of resource IDs in the study cohort) somewhere so that data-metrics could read it and scope down its investigation to a set of IDs rather than the whole database.

Maybe just patient & encounter IDs? Or could do it for all resources.

I don't know what table naming approach makes sense. Maybe study_name__lib_manifest_patients?

Cleaning (solved by #309)

Another concern is that the Library likes to auto-clean a study prefix during build. If the data-metrics study is making per-study little mini-builds in a custom prefix (maybe data_metrics_study_name__*), we'll need to tell the Library to only clean that custom prefix.

Library code has the option for custom prefix cleaning. We just need to tell it which prefix.

Since that would be dynamic (likely based on some runtime option like --option study:study_name), we'd need the Library to call some study-based Python code for the prefix.

Maybe that could be more generic and have a manifest hook for some early Python that would allow editing the manifest definition (of which, study prefix is but one option).

mikix commented 1 week ago

After talking, I believe Matt and I are thinking that for the cleaning part - we'll add something to the manifest.toml like:

prefix_generator = 'gen-my-prefix.py'

And this would allow the study to return a string (which Library would require to be [a-zA-Z_] or similar) to use as a prefix. Very custom but scoped-down approach.

mikix commented 1 day ago

The cleaning portion has been solved by #309