mmcdermott / MEDS_transforms

A simple set of MEDS polars-based ETL and transformation functions
MIT License
15 stars 3 forks source link

The default extraction ETL should likely not include an `aggregate_code_metadata.py` stage, unless anyone thinks it would be almost universally useful. #110

Closed mmcdermott closed 1 month ago

mmcdermott commented 1 month ago

This means columns like code/n_occurrences, value/sum, etc. would not be computed during aggregation. Code metadata (e.g., description, parent_codes, etc.) would still be included by default.

Does anyone see any reason why this aggregation stage should be included by default during extraction to MEDS (the stage will still be usable of course during pre-processing pipelines).

Tagging @EthanSteinberg, @Oufattole, @prenc, @prockenschaub, @tompollard for inputs.

Per discussion below, there are two planned tasks:

EthanSteinberg commented 1 month ago

I'd be in favor of removing these from the default to keep the reference MIMIC ETL as simple as possible

prockenschaub commented 1 month ago

I second that, the easier the better + a clear documentation when and why you'd want to do the aggregation.

mmcdermott commented 1 month ago

I agree and will plan on removing it at some point soon. This does raise some further questions (both about MEDS and other pipelines, in particular #116/#117), but I think for now the right move is to: