Closed constantinkappel closed 6 months ago
Thank you for making issue about it.
Whether it's for prediction or for pretraining, inputs for MIMIC-III and eICU must be generated sequentially before preparing the pooled input. This is because the pooled input merges the inputs from MIMIC-III and eICU.
When running the preprocessing code for the pretrain dataset in the preprocess_run.sh at the example run directory,
try specifying the dataset_path as shown below.
(dataset_path is same as the path used in predict dataset preparation.)
Also, please use the same "dest_path" as the predict dataset used. This will create a subdirectory named "mlm" at "dest_path".
I added "dataset_path" at the example run code.
python ../preprocess/preprocess_main.py \
--src_data mimiciii \
--dataset_path /user/mimiciii \
--dest_path /user/descemb/dataset \
--data_type pretrain ;
python ../preprocess/preprocess_main.py \
--src_data eicu \
--dataset_path /user/eicu \
--dest_path /user/descemb/dataset \
--data_type pretrain ;
python ../preprocess/preprocess_main.py \
--src_data pooled \
--dest_path /user/descemb/dataset \
--data_type pretrain ;
Thank you so much, @hoon9405, for getting back so quickly!
I tried again with your changes. Essentially, it meant for me to include the dataset_path also in pretraining.
I am sharing my version of your script, in case it might help somebody:
INPUT_PATH=/home/user/data
OUTPUT_PATH=/home/user/data/output
DX_PATH=$INPUT_PATH/ccs_multi_dx_tool_2015.csv
python ../preprocess/preprocess_main.py \
--src_data mimiciii \
--dataset_path $INPUT_PATH/mimic \
--ccs_dx_tool_path $DX_PATH \
--dest_path $OUTPUT_PATH ;
python ../preprocess/preprocess_main.py \
--src_data eicu \
--dataset_path $INPUT_PATH/eicu \
--ccs_dx_tool_path $DX_PATH \
--dest_path $OUTPUT_PATH ;
python ../preprocess/preprocess_main.py \
--src_data pooled \
--ccs_dx_tool_path $DX_PATH \
--dest_path $OUTPUT_PATH ;
python ../preprocess/preprocess_main.py \
--src_data mimiciii \
--dataset_path $INPUT_PATH/mimic \
--dest_path $OUTPUT_PATH \
--ccs_dx_tool_path $DX_PATH \
--data_type pretrain ;
python ../preprocess/preprocess_main.py \
--src_data eicu \
--dataset_path $INPUT_PATH/eicu \
--dest_path $OUTPUT_PATH \
--ccs_dx_tool_path $DX_PATH \
--data_type pretrain ;
python ../preprocess/preprocess_main.py \
--src_data pooled \
--dest_path $OUTPUT_PATH \
--ccs_dx_tool_path $DX_PATH \
--data_type pretrain ;
Closing issue.
Thanks for putting so many updates recently!
I am running preprocessing from your latest commit 499d9d6. I am using
./run_example/preprocessing_run.sh
with the following modificationsWhile doing pooled preprocessing I get:
I checked the file system and a folder
/myfolder/output/mlm/pooled
was created, but there is nomimiciii_df.pkl
in there. Rather, there is such a file in/myfolder/output/
. Is it OK to just set a symbolic link in/myfolder/output/mlm
as a workaround or are these supposed to be entirely different files.