ratschlab / HIRID-ICU-Benchmark

Repository for the HiRID ICU Benchmark (HiB) project
MIT License
49 stars 10 forks source link

Assertion Error while running pre-processing scripts #29

Open munibmesinovic opened 2 months ago

munibmesinovic commented 2 months ago

Hi! Ran into the following error when running the icu benchmarks pre-processing scripts on raw HIRID data with the following command:

!python icu_benchmarks/run.py preprocess --hirid-data-root 'Data/' \ --work-dir 'Preprocessed_Data' \ --var-ref-path ./preprocessing/resources/varref.tsv \ --split-path ./preprocessing/resources/split.tsv \ --nr-workers 8

The error:

2024-04-07 09:35:41,037 - INFO: Generating extended general table in Preprocessed_Data/general_table_extended.parquet 2024-04-07 09:38:08,311 - INFO: Running merge step... 2024-04-07 09:38:10,442 - INFO: Reading general table from Preprocessed_Data/general_table_extended.parquet 2024-04-07 09:38:10,456 - INFO: start processing using 8 worker 0it [04:37, ?it/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/multiprocess/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 24, in _process_parts df_ret = combine_fn(dfs_mapped) File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/preprocessing/merge.py", line 225, in combine_obs_and_pharma_tables assert ((df_pid.iloc[:, 2:].notnull().sum(axis=1) == 0).sum() == 0) AssertionError """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 435, in main() File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 377, in main run_preprocessing_pipeline(args.hirid_data_root, args.work_dir, args.var_ref_path, File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 330, in run_preprocessing_pipeline run_merge_step(hirid_data_root, var_ref_path, merged_path, nr_workers, extended_general_data_path) File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 190, in run_merge_step merge.merge_tables( File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/preprocessing/merge.py", line 284, in merge_tables processing.map_and_combine_patient_dfs( File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 28, in map_and_combine_patient_dfs exec_parallel_on_parts(_process_parts, all_paths_same_part, workers) File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 66, in exec_parallel_on_parts return list(tqdm.tqdm(pool.imap(fnc, part_list))) File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1181, in iter for obj in iterable: File "/usr/local/lib/python3.10/dist-packages/multiprocess/pool.py", line 873, in next raise value AssertionError

hugoych commented 2 months ago

Hi, I wasn't able to reproduce your issue.

What path in the HiRID public folder does your`'Data/' path correspond to?