issues
search
mmcdermott
/
MEDS_transforms
A simple set of MEDS polars-based ETL and transformation functions
MIT License
15
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
The dropping of nulls and making the dataframe unique could be done once and shared across all time dependent fntrs.
#152
mmcdermott
opened
1 month ago
0
Release 0.0.5
#151
mmcdermott
closed
1 month ago
1
Don't write any sub-sharded files at all; just compute them and pass them through.
#150
mmcdermott
closed
1 month ago
1
Tokenization tensorization documentation
#149
mmcdermott
closed
1 month ago
1
We need to be able to support joining on metadata based on partial code matches (e.g., no `valueuom`).
#148
mmcdermott
opened
1 month ago
2
Normalization stage is checking for aggregate_code_metadata/codes.parqet columns and metadata/codes.parquet columns in data/codes.parquet
#147
Oufattole
closed
1 month ago
5
Release candidate for 0.0.4
#146
mmcdermott
closed
1 month ago
1
reshard stage code is very messy and really stretches the limits of this "MR" library's API.
#145
mmcdermott
opened
1 month ago
0
Integration tests should run in parallel mode as well as in serial mode.
#144
mmcdermott
closed
2 weeks ago
1
Fixes brittle reduce stage checking in Reshard stage.
#143
mmcdermott
closed
1 month ago
3
`hydra_loguru_init` only captures a portion of the logging that happens in the code.
#142
mmcdermott
opened
1 month ago
0
Logging strings should indicate what worker they belong to.
#141
mmcdermott
opened
1 month ago
0
Logging may be misconfigured for importing this package as a library.
#140
mmcdermott
opened
1 month ago
1
`reshard_to_split` is buggy when run in parallel.
#139
mmcdermott
closed
1 month ago
1
Release candidate for 0.0.4
#138
mmcdermott
closed
1 month ago
3
Improved compliance by removing creation of shards.json file and adding `patient_splits.parquet` file.
#137
mmcdermott
closed
1 month ago
1
Transformation testing code is not necessarily fully MEDS v0.3 compatible.
#136
mmcdermott
closed
2 weeks ago
0
Adds a reshard-by-split stage as a transform.
#135
mmcdermott
closed
1 month ago
2
Add a "reshard_by_split" stage that reshards a MEDS datasets into shards that subdivide splits via `metadata/patient_splits.parquet`
#134
mmcdermott
closed
1 month ago
2
This moves the parser to a general import, and absorbs some type checking commits from another PR to clean up the PR.
#133
mmcdermott
closed
1 month ago
1
Removed unnecessary usage of the `splits.json` file that MEDS-Extract writes during sharding.
#132
mmcdermott
closed
1 month ago
2
Add .editorconfig
#131
prenc
closed
1 month ago
6
Make it such that `external_splits` specification can point to a `patient_splits.parquet` file or a prior `splits.json` file from MEDS-extract to match the cohort.
#130
mmcdermott
opened
1 month ago
0
Pipelines should automatically determine shards from the input directory rather than relying on the `splits.json` file.
#129
mmcdermott
closed
1 month ago
0
Adds readthedocs via `mkdocs`. Still in progress.
#128
mmcdermott
closed
1 month ago
2
`aggregate_code_metadata` is currently crashing if the input data doesn't have a `numeric_value` column.
#127
mmcdermott
closed
2 weeks ago
1
Pipeline should throw a warning if there are deprecated column names and not current column names in event config
#126
mmcdermott
opened
1 month ago
1
Extraction ETL crashes if you include the `extract_metadata` stage but you don't have any `_metadata` blocks in your configs.
#125
mmcdermott
closed
1 month ago
4
Fix split_and_shard_patients when the full split definition is provided
#124
prenc
closed
1 month ago
2
Updated workflows and README.
#123
mmcdermott
closed
1 month ago
2
Tokenization & Tensorization Updates
#122
mmcdermott
opened
1 month ago
0
Adds an 'extract_values' transform to extract values and retype them from input MEDS data.
#121
mmcdermott
closed
2 weeks ago
3
Add transformation for injecting time-interval codes based on config specification
#120
prenc
opened
1 month ago
15
Adds the ability to specialize existing transformations to be applied with separate configurations to different portions of the data based on configurable filters.
#119
mmcdermott
closed
1 month ago
3
Ensure that documentation specifies that all stages that rely on `metadata/codes.parquet` having all codes should run an explicit aggregation first.
#118
mmcdermott
opened
1 month ago
0
Decide how to handle stages that require metadata to contain all codes in the (train set) of the dataset.
#117
mmcdermott
opened
1 month ago
1
Various transforms that rely on the `metadata/codes.parquet` file may require all codes present in the data to be in that dataframe.
#116
mmcdermott
opened
1 month ago
0
Release 0.0.3 -- ensures MIMIC-IV ETL can be run without `rootutils` and that it works with unzipped files.
#115
mmcdermott
closed
1 month ago
2
MIMIC-IV ETL does not work unless you clone the GitHub.
#114
mmcdermott
closed
2 weeks ago
1
MIMIC-IV code may be unnecessarily dependent in the paths read on the data being in a versioned subdir
#113
mmcdermott
opened
1 month ago
0
Consider adjusting the split fractions for MIMIC-IV and eICU
#112
mmcdermott
opened
1 month ago
0
`values/sum_sqd` and possibly `values/sum` may overflow. We should consider adapting the aggregation space to work in the `values/mean` and `values/variance` space instead.
#111
mmcdermott
opened
1 month ago
0
The default extraction ETL should likely not include an `aggregate_code_metadata.py` stage, unless anyone thinks it would be almost universally useful.
#110
mmcdermott
closed
1 month ago
3
Documentation needs to be improved to explain what the `null` code row means for the output of `aggregate_code_metadata.py` and how to disable it.
#109
mmcdermott
opened
1 month ago
0
Update README.md
#108
EthanSteinberg
closed
4 weeks ago
2
Allow `map_over` to accept compute functors or partials with standardized args rather than merely direct functions.
#107
mmcdermott
closed
1 month ago
2
`description_separator` is never used in `extract_code_metadata.py` integration tests.
#106
mmcdermott
opened
1 month ago
0
Removes the unused ability to use sequences of compute functions.
#105
mmcdermott
closed
1 month ago
1
Supporting tuples of `compute_fn` to `map_over` and `rwlock_wrap` may be unnecessary complexity
#104
mmcdermott
closed
1 month ago
0
Death `time` columns that are reported with a 24h time of 00:00 in MIMIC-IV should likely be pushed back to 23:59
#103
mmcdermott
opened
1 month ago
0
Previous
Next