rvandewater / YAIB-cohorts

🏥Generate task cohort for the YAIB framework.
https://github.com/stars/rvandewater/lists/yaib
MIT License
7 stars 4 forks source link

How to proessing MIMIC-IV 2.0 or 2.2 #5

Open anothersin opened 10 months ago

anothersin commented 10 months ago

Hi thanks for the great package!

I tried to reproduce the data preprocessing in mimic iv 2.0, but only 24 tables were processed and could not be imported into ricu. ricu is designed for mimic iv 1.0, how can I change the code to make it compatible with mimic iv?

prockenschaub commented 10 months ago

Hi, thanks for reaching out. We are in contact with the original ricu developers and are now actively involved in solving this issue that seems to affect several people. We hope to resolve this soon but please bear with me, it might still be another couple of days/weeks before this is resolved. To help me help you, can you specify in more detail where it fails. I have a local copy of MIMIC IV 2.0 running, which only has 27/28 (without the omr table) but can be loaded by ricu.

anothersin commented 10 months ago

Thank you for your reply, we have repeatedly tried many attempts such as changing the ricu version and re-downloading the MIMIC IV 2.0 dataset, but none of them solved the problem.

The following results are based on ricu 0.5.3, The results of the run indicate that the miiv in the ricu we are using has a maximum of 27 tables, and we can only process up to 24 tables.

We do the following on the console:

> import_src('miiv', './local_data/miiv')
 Importing 24 tables for `miiv` 
Error in pb_tick(self, private, len, tokens) :                                                                              
  not all !self$finished are TRUE
In addition: There were 21 warnings (use warnings() to see them)
> warnings()
warnings messages:
1: expected 5280351 rows but got 5006884 rows for table `diagnoses_icd`
2: expected 1630 rows but got 1623 rows for table `d_labitems`
3: expected 769622 rows but got 636157 rows for table `drgcodes`
4: expected 55947921 rows but got 57469291 rows for table `emar_detail`
5: expected 27464367 rows but got 28189413 rows for table `emar`
6: expected 160727 rows but got 159156 rows for table `hcpcsevents`
7: expected 122103667 rows but got 124342638 rows for table `labevents`
8: expected 3397914 rows but got 3395229 rows for table `microbiologyevents`
9: expected 14736386 rows but got 14291703 rows for table `pharmacy`
10: expected 3256358 rows but got 3174971 rows for table `poe_detail`
11: expected 42483962 rows but got 41427803 rows for table `poe`
12: expected 17008053 rows but got 16219412 rows for table `prescriptions`
13: expected 779625 rows but got 704124 rows for table `procedures_icd`
14: expected 562892 rows but got 492967 rows for table `services`
15: expected 329499788 rows but got 329822285 rows for table `chartevents`
16: expected 7495712 rows but got 7477876 rows for table `datetimeevents`
17: expected 3861 rows but got 4014 rows for table `d_items`
18: expected 76540 rows but got 76943 rows for table `icustays`
19: expected 9460658 rows but got 9442345 rows for table `inputevents`
20: expected 4457381 rows but got 4450049 rows for table `outputevents`
21: expected 731247 rows but got 731788 rows for table `procedureevents`
>

The logs from ricu are as follows:

The following data sources are configured to be attached:
(the environment variable `RICU_SRC_LOAD` controls this)

✔ mimic: 26 of 26 tables available
✖ mimic_demo: 0 of 25 tables available
✔ eicu: 31 of 31 tables available
✖ eicu_demo: 0 of 31 tables available
✖ hirid: 0 of 5 tables available
✖ aumc: 0 of 7 tables available
✖ miiv: 24 of 27 tables available

Appreciate your help.

prockenschaub commented 10 months ago

Thanks for the extra information! I think I was now able to figure out where the problem lies: the data-sources.json config file in the ricu fork used by YAIB still had the old path prefix core/ from MIIV 1.0 for the tables admissions, patients, and transfers. This must have been unintentionally reverted at some point during the development of YAIB.

I have updated the monkeypatch branch of ricu fork currently used by YAIB: https://github.com/prockenschaub/ricu-package/blob/monkeypatch/inst/extdata/config/data-sources.json. I also raised a PR #6 to include this update in the revn lock file, which should allow installing the right ricu version via renv::restore().

Obviously, it is semi-optimal to work off a fork. We are therefore actively working with the original ricu developers to update the MIIV version in main ricu and also integrate all the other improvements and bug fixes made on the fork. Due to some project deadlines, however, it may still take another month or two until this is fully done. Please be patient with us.

I hope you have everything in the meantime to use YAIB with MIMIV IV 2.0, but do let me know if you run into any further problems.

prockenschaub commented 10 months ago

P.S.: you can ignore the warnings regarding the expected vs. actual rows. ricu includes the table rows in the config and raises the warning if the imported number of rows does not match the expected number. The numbers currently in the config still relate to version 1.0, so disagreements are expected. The warning will be fixed when we update the main ricu package.