mmcdermott / MEDS_transforms

A simple set of MEDS polars-based ETL and transformation functions
MIT License
15 stars 3 forks source link

We need to be able to support joining on metadata based on partial code matches (e.g., no `valueuom`). #148

Open mmcdermott opened 1 month ago

mmcdermott commented 1 month ago

https://github.com/mmcdermott/MEDS_transforms/blob/b5f027764441a383e9510804c37633e1b0f30e0f/MIMIC-IV_Example/configs/event_configs.yaml#L211

mmcdermott commented 1 month ago

The solution here is to make it so that, in extract_code_metadata, if the metadata_config (e.g., https://github.com/mmcdermott/MEDS_transforms/blob/main/MIMIC-IV_Example/configs/event_configs.yaml#L217) sets a code part column to null, such as that shown below:

meas_chartevents_main:
  description: ["omop_concept_name", "label"] # List of strings are columns to be collated
  itemid: "itemid (omop_source_code)"
  parent_codes: "{omop_vocabulary_id}/{omop_concept_code}"
  valueuom: null

then the system identifies from the set of allowed codes all those codes that would match the code constructed for the surrounding event with a valueuom set to null and takes a cross-product join between the metadata rows and all matching codes.

This leaves it up to the user to identify which parts of the code are subsidiary on a case by case basis. This will make it trickier to work with a more expressive code parser language in the future, as this will only work if we can deconstruct realized codes into code parts, but that's ok for now.

mmcdermott commented 2 weeks ago

This is not actively causing any issues given #156 has been resolved, so I've lowered the priority.