rvandewater / YAIB

🧪Yet Another ICU Benchmark: a holistic framework for the standardization of clinical prediction model experiments. Provide custom datasets, cohorts, prediction tasks, endpoints, preprocessing, and models. Paper: https://arxiv.org/abs/2306.05109
https://github.com/rvandewater/YAIB/wiki
MIT License
50 stars 9 forks source link

Check if order is preserved #128

Closed rvandewater closed 1 year ago

rvandewater commented 1 year ago

Time order should preserved for traditional ML methods: https://github.com/rvandewater/YAIB/blob/8a56504b8be2ce05da4fedac8116084779cabd56/icu_benchmarks/data/loader.py#L147

rvandewater commented 1 year ago

@mlondschien it seems after testing that the order is preserved at this step (also owing to the sort=False). Let me know if your experience is different.

mlondschien commented 1 year ago

IIUC sort=False results in pandas not grouping by the groupby-keys. My concern was related to the order within the groups, as you are extracting the "last" element via .last(). Is the "last" row within a group always equal to the row with maximal charttime?

rvandewater commented 1 year ago

The features_df should be ordered by time, yes. There might be a better way of doing it, but with the following command, it extracts the last row, as you can see when using count historical feature generation:

train
-d
demo_data/mortality24/mimic_demo
-t
BinaryClassification
--log-dir
../yaib_logs/mortality
--tune
-m
LGBMClassifier
-s
1111
--checkpoint
test

image Perhaps I could put a check in there to assure the order is increasing within group before the time column is removed.