issues
search
mmcdermott
/
MEDS_transforms
A simple set of MEDS polars-based ETL and transformation functions
MIT License
19
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Adds the ability to specialize existing transformations to be applied with separate configurations to different portions of the data based on configurable filters.
#119
mmcdermott
closed
3 months ago
3
Ensure that documentation specifies that all stages that rely on `metadata/codes.parquet` having all codes should run an explicit aggregation first.
#118
mmcdermott
opened
3 months ago
0
Decide how to handle stages that require metadata to contain all codes in the (train set) of the dataset.
#117
mmcdermott
opened
3 months ago
1
Various transforms that rely on the `metadata/codes.parquet` file may require all codes present in the data to be in that dataframe.
#116
mmcdermott
opened
3 months ago
0
Release 0.0.3 -- ensures MIMIC-IV ETL can be run without `rootutils` and that it works with unzipped files.
#115
mmcdermott
closed
3 months ago
2
MIMIC-IV ETL does not work unless you clone the GitHub.
#114
mmcdermott
closed
2 months ago
1
MIMIC-IV code may be unnecessarily dependent in the paths read on the data being in a versioned subdir
#113
mmcdermott
opened
3 months ago
0
Consider adjusting the split fractions for MIMIC-IV and eICU
#112
mmcdermott
opened
3 months ago
0
`values/sum_sqd` and possibly `values/sum` may overflow. We should consider adapting the aggregation space to work in the `values/mean` and `values/variance` space instead.
#111
mmcdermott
opened
3 months ago
0
The default extraction ETL should likely not include an `aggregate_code_metadata.py` stage, unless anyone thinks it would be almost universally useful.
#110
mmcdermott
closed
2 months ago
3
Documentation needs to be improved to explain what the `null` code row means for the output of `aggregate_code_metadata.py` and how to disable it.
#109
mmcdermott
opened
3 months ago
0
Update README.md
#108
EthanSteinberg
closed
2 months ago
2
Allow `map_over` to accept compute functors or partials with standardized args rather than merely direct functions.
#107
mmcdermott
closed
3 months ago
2
`description_separator` is never used in `extract_code_metadata.py` integration tests.
#106
mmcdermott
opened
3 months ago
0
Removes the unused ability to use sequences of compute functions.
#105
mmcdermott
closed
3 months ago
1
Supporting tuples of `compute_fn` to `map_over` and `rwlock_wrap` may be unnecessary complexity
#104
mmcdermott
closed
3 months ago
0
Death `time` columns that are reported with a 24h time of 00:00 in MIMIC-IV should likely be pushed back to 23:59
#103
mmcdermott
opened
3 months ago
0
`compute_fn`s passed to `map_over` helpers should be able to be functors that take in the config and stage config inside the `map_over` helper.
#102
mmcdermott
closed
3 months ago
1
Add match & revise syntax transformations
#101
mmcdermott
closed
3 months ago
4
Transforms need `main` function docstrings to get good script help messages.
#100
mmcdermott
opened
3 months ago
0
Once quantiles are added, we need to be able to support binning numerical values as well
#99
mmcdermott
opened
3 months ago
0
Support quantile computation over numerical values in a simple (though inefficient) manner.
#98
mmcdermott
closed
3 months ago
2
Release candidate v0.0.2: MEDS v0.3 Compatability
#97
mmcdermott
closed
3 months ago
1
Use `MEDS_BIRTH` and `MEDS_DEATH` codes by default, per https://github.com/Medical-Event-Data-Standard/meds/blob/5f87c2fdcce7f8bab46af6f81ef7892fdee098c1/src/meds/schema.py#L26
#96
mmcdermott
closed
3 months ago
1
Update the eICU pipeline and get it fully working.
#95
mmcdermott
closed
3 months ago
1
Add a `finalize_MEDS_data.py` stage to validate and convert to the MEDS schema for the data files and ensure those work.
#94
mmcdermott
closed
3 months ago
1
This makes it so that if there is a dataset with a `metadata/patient_splits.parquet` file specified and a stage that specifies a restriction to a stage, it will filter the patients on the basis of that file.
#93
mmcdermott
closed
3 months ago
1
Given an input dataset with a `patient_splits.parquet` file, the data pre-processing steps should use that.
#92
mmcdermott
closed
3 months ago
0
Adds a finalize metadata stage that handles `dataset.json`, `codes.parquet`, and `patient_splits.parquet` retyping and MEDS verification.
#91
mmcdermott
closed
3 months ago
1
We need to use MEDS complaint code and split constants where possible
#90
mmcdermott
closed
3 months ago
0
Fixed schema of `parent_codes` and `description` in the `extract_code_metadata.py` script.
#89
mmcdermott
closed
3 months ago
1
If extract metadata is not run or aggregate metadata is not run, then an empty codes.parquet file should still be generated with the requisite (all empty) columns.
#88
mmcdermott
closed
3 months ago
0
Full Compatability with MEDS v0.3.
#87
mmcdermott
closed
3 months ago
1
Updates file paths and pipeline configuration so final data and metadata outputs are written to MEDS v0.3 compatible file paths.
#86
mmcdermott
closed
3 months ago
1
We should set up automatic deployments based on GH workflows, as shown in the linked file.
#85
mmcdermott
closed
3 months ago
3
`parent_code` needs to be a list of strings in the code metadata schema and description a string.
#84
mmcdermott
closed
3 months ago
0
Converted codes to strings for MEDS v0.3 compliance.
#83
mmcdermott
closed
3 months ago
1
Support for different tensorization strategies
#82
Oufattole
closed
3 months ago
5
51 add aggregation for quantile computation of code values
#81
Oufattole
closed
3 months ago
5
Added additional terminology.
#80
mmcdermott
closed
3 months ago
1
Switch codes from categorical to string column types
#79
mmcdermott
closed
3 months ago
1
For now, move non-matched code parts to properties columns in raw MIMIC ETL
#78
mmcdermott
closed
3 months ago
0
Set up readthedocs
#77
mmcdermott
closed
3 months ago
0
Metadata extraction column joining needs to be able to omit parts of the code that aren't needed.
#76
mmcdermott
closed
3 months ago
1
Release 0.0.1 Candidate
#75
mmcdermott
closed
3 months ago
1
Update usage to reflect different usage modes: CLI use direct, CLI use through a custom pipeline, import as a library.
#74
mmcdermott
closed
3 months ago
1
Release 0.0.1 Tracker
#73
mmcdermott
closed
3 months ago
1
Polars performance warning on extract_code_metadata
#72
mmcdermott
opened
3 months ago
0
Update the MIMIC-IV example with the updated interface and installable options.
#71
mmcdermott
closed
3 months ago
2
Automatic determination of stage name not working after pip install on O2 servers
#70
mmcdermott
opened
3 months ago
0
Previous
Next