YAIB logo

Generating Patient cohorts for 🧪 Yet Another ICU Benchmark

This repo uses the `ricu` R package to derive patient cohorts for prediction tasks from the following intensive care databases:	Dataset	MIMIC-III / IV	eICU-CRD	HiRID
Admissions	40k / 73k	200k	33k	23k
Version	v1.4 / v2.2	v2.0	v1.1.1	v1.0.2
Frequency (time-series)	1 hour	5 minutes	2 / 5 minutes	up to 1 minute
Originally published	2015 / 2020	2017	2020	2019
Origin	USA	USA	Switzerland	Netherlands

New datasets can also be added. We are currently working on a package to make this process as smooth as possible.

We provide five common tasks for clinical prediction by default:

No	Task	Frequency	Type
1	ICU Mortality	Once per Stay (after 24H)	Binary Classification
2	Acute Kidney Injury (AKI)	Hourly (within 6H)	Binary Classification
3	Sepsis	Hourly (within 6H)	Binary Classification
4	Kidney Function(KF)	Once per stay	Regression
5	Length of Stay (LoS)	Hourly (within 7D)	Regression

New tasks can be easily added. The following repositories may be relevant as well:

YAIB: Main repository for YAIB.
YAIB-models: Pretrained models for YAIB.
ReciPys: Preprocessing package for YAIB pipelines.

📄 Paper

If you use this code in your research, please cite the following publication:

@article{vandewaterYetAnotherICUBenchmark2023,
    title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
    shorttitle = {Yet Another ICU Benchmark},
    url = {http://arxiv.org/abs/2306.05109},
    language = {en},
    urldate = {2023-06-09},
    publisher = {arXiv},
    author = {van de Water, Robin and Schmidt, Hendrik and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},
    month = jun,
    year = {2023},
    note = {arXiv:2306.05109 [cs]},
    keywords = {Computer Science - Machine Learning},
}

This paper can also be found on arxiv: https://arxiv.org/pdf/2306.05109.pdf

To replicate the cohorts:

Run the following commands to clone this repo:

git clone https://github.com/rvandewater/YAIB-cohorts.git
cd YAIB-cohorts

Once you have cloned the repo, all cohorts can be created directly from within R or via an interface from python. Instructions for each can be found at:

R: README.md
Python: README.md

Note: due to some recent bug fixes in ricu, the extracted cohorts might differ marginally to those published in the benchmarking paper.

Clairvoyance Conversion

To output the cohorts in the Clairvoyance (https://github.com/vanderschaarlab/clairvoyance) format, you can use the following utils.py function

output_clairvoyance(data_dir, save_dir, task_type="static")

You can specify the size and the type of task ("static": i.e., one outcome label per stay_id (mortality, KF) or "dynamic": (Sepsis, AKI, LOS), i.e., one outcome label per time step) and the train/test split in the make_train_test function.

Acknowledgements

The code in this repository heavily utilises the ricu R package, without which deriving these cohorts would have been much more difficult. If you use the code in this repo, please go give their repo a star :)

This repo is based on earlier work by Rockenschaub et al. (2023), which can be found at https://github.com/prockenschaub/icuDG-preprocessing

License

This source code is released under the MIT license, included here.

rvandewater / YAIB-cohorts

readme