A simple set of MEDS polars-based ETL and transformation functions
MIT License
19
stars
5
forks
source link
`values/sum_sqd` and possibly `values/sum` may overflow. We should consider adapting the aggregation space to work in the `values/mean` and `values/variance` space instead. #111
This would require re-working aggregate_code_metadata.py to support and recognize dependencies between aggregations -- e.g., values/mean depends on values/n_occurrences, and values/variance depends on both values/mean and values/n_occurrences (because when these are computed in a sharded manner you need to maintain the intermediate stats of the shards to compute the true aggregate values during the reduce).
This would require re-working
aggregate_code_metadata.py
to support and recognize dependencies between aggregations -- e.g.,values/mean
depends onvalues/n_occurrences
, andvalues/variance
depends on bothvalues/mean
andvalues/n_occurrences
(because when these are computed in a sharded manner you need to maintain the intermediate stats of the shards to compute the true aggregate values during the reduce).