Open rvandewater opened 1 day ago
The changes introduce several new configuration files, scripts, and a README for the AUMC Example and MIMIC-IV Example projects. These updates include detailed instructions for dataset extraction, structured schemas for patient data management, and enhancements for data processing workflows. The scripts facilitate the conversion of raw data into structured formats, incorporating error handling and parallel processing capabilities. Additionally, modifications to existing files clarify directory paths and improve user guidance on handling data.
Files | Change Summary |
---|---|
AUMC_Example/README.md |
Added comprehensive instructions for extracting MEDS datasets from AUMCdb, including installation, dataset downloading, and ETL process execution. |
AUMC_Example/configs/*.yaml |
Introduced multiple configuration files defining schemas for patient data, managing input/output directories, and preprocessing steps for various medical data items. |
AUMC_Example/joint_script.sh |
Added a script for processing AUMCdb data with error handling, help messages, and a defined workflow for converting raw data into a MEDS cohort. |
AUMC_Example/local_parallelism_runner.yaml |
Added configuration options for parallel processing, specifying worker parameters and the use of the Joblib library. |
AUMC_Example/pre_MEDS.py |
Introduced a Python script for data wrangling, utilizing Polars for data manipulation and Hydra for configuration management. |
AUMC_Example/run.sh |
Added a script for processing MIMIC-IV data with error handling and unzipping options, ensuring proper execution of the data processing pipeline. |
MIMIC-IV_Example/README.md |
Clarified directory paths for storing MIMIC-IV data and updated commands for running the MEDS ETL process. |
MIMIC-IV_Example/run.sh |
Enhanced validation checks for positional arguments to prevent incorrect usage of the script. |
sequenceDiagram
participant User
participant Script
participant DataProcessor
participant Config
User->>Script: Run data processing command
Script->>Config: Load configurations
Script->>DataProcessor: Process raw data
DataProcessor->>DataProcessor: Convert to pre-MEDS format
DataProcessor->>DataProcessor: Shard and merge data
DataProcessor->>Script: Return processed data
Script->>User: Output results
🐇 In the meadow, where data flows,
AUMC and MIMIC, where knowledge grows.
With scripts and configs, we dance and play,
Transforming raw numbers, come what may!
Hopping through datasets, so bright and new,
A rabbit's delight in the work we do! 🐇
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Additionally, some fixes for the MIMIC-IV example
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores