2024-04-07 09:35:41,037 - INFO: Generating extended general table in Preprocessed_Data/general_table_extended.parquet
2024-04-07 09:38:08,311 - INFO: Running merge step...
2024-04-07 09:38:10,442 - INFO: Reading general table from Preprocessed_Data/general_table_extended.parquet
2024-04-07 09:38:10,456 - INFO: start processing using 8 worker
0it [04:37, ?it/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 24, in _process_parts
df_ret = combine_fn(dfs_mapped)
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/preprocessing/merge.py", line 225, in combine_obs_and_pharma_tables
assert ((df_pid.iloc[:, 2:].notnull().sum(axis=1) == 0).sum() == 0)
AssertionError
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 435, in
main()
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 377, in main
run_preprocessing_pipeline(args.hirid_data_root, args.work_dir, args.var_ref_path,
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 330, in run_preprocessing_pipeline
run_merge_step(hirid_data_root, var_ref_path, merged_path, nr_workers, extended_general_data_path)
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 190, in run_merge_step
merge.merge_tables(
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/preprocessing/merge.py", line 284, in merge_tables
processing.map_and_combine_patient_dfs(
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 28, in map_and_combine_patient_dfs
exec_parallel_on_parts(_process_parts, all_paths_same_part, workers)
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 66, in exec_parallel_on_parts
return list(tqdm.tqdm(pool.imap(fnc, part_list)))
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/multiprocess/pool.py", line 873, in next
raise value
AssertionError
Hi! Ran into the following error when running the icu benchmarks pre-processing scripts on raw HIRID data with the following command:
!python icu_benchmarks/run.py preprocess --hirid-data-root 'Data/' \ --work-dir 'Preprocessed_Data' \ --var-ref-path ./preprocessing/resources/varref.tsv \ --split-path ./preprocessing/resources/split.tsv \ --nr-workers 8
The error:
2024-04-07 09:35:41,037 - INFO: Generating extended general table in Preprocessed_Data/general_table_extended.parquet 2024-04-07 09:38:08,311 - INFO: Running merge step... 2024-04-07 09:38:10,442 - INFO: Reading general table from Preprocessed_Data/general_table_extended.parquet 2024-04-07 09:38:10,456 - INFO: start processing using 8 worker 0it [04:37, ?it/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/multiprocess/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 24, in _process_parts df_ret = combine_fn(dfs_mapped) File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/preprocessing/merge.py", line 225, in combine_obs_and_pharma_tables assert ((df_pid.iloc[:, 2:].notnull().sum(axis=1) == 0).sum() == 0) AssertionError """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 435, in
main()
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 377, in main
run_preprocessing_pipeline(args.hirid_data_root, args.work_dir, args.var_ref_path,
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 330, in run_preprocessing_pipeline
run_merge_step(hirid_data_root, var_ref_path, merged_path, nr_workers, extended_general_data_path)
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/run.py", line 190, in run_merge_step
merge.merge_tables(
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/preprocessing/merge.py", line 284, in merge_tables
processing.map_and_combine_patient_dfs(
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 28, in map_and_combine_patient_dfs
exec_parallel_on_parts(_process_parts, all_paths_same_part, workers)
File "/content/gdrive/MyDrive/DynaGraph/HIRID/HIRID-ICU-Benchmark/icu_benchmarks/common/processing.py", line 66, in exec_parallel_on_parts
return list(tqdm.tqdm(pool.imap(fnc, part_list)))
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/multiprocess/pool.py", line 873, in next
raise value
AssertionError