Patch-GCN main.py do not work

liuxiaoping2020 commented 2 years ago

Firstly, I should thank Mahmood Lab @mahmoodlab and their scientists @Richarizardd for such a outstanding work!! I just downloaded the whole slide image (WSI) of TCGA-BLCA from GDC, and preprocessed the svs files using your CLAM pipeline as the instructions in the github main page of CLAM:

(1) created patches using the Ostu method, which generated masks, stitches, and patches (code: CUDA_VISIBLE_DEVICES=0,1 python create_patches_fp.py --source /mnt1/TCGA/data --save_dir /mnt1/TCGA/ostu_result --patch_size 256 --preset tcga_ostu.csv --seg --patch --stitch); (2) extracted features based on the h5 files generated above, which created pt files for subsequent analysis. (code: CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir /mnt1/TCGA/ostu_result --data_slide_dir /mnt1/TCGA/data --csv_path /mnt1/TCGA/ostu_result/process_list_autogen.csv --feat_dir /mnt1/TCGA/ostu_result --batch_size 512 --slide_ext .svs)

(3) I created graphs based on the instructions you indicated at https://github.com/mahmoodlab/Patch-GCN/blob/master/WSI-Graph%20Construction.ipynb, which created graph based pt files.

Then, I tried to analyze TCGA-BLCA WSI using Patch-GCN. To to this, I set the directory ostu_result as the DATA_ROOT_DIR, within which, I put two sub-directories: (1) splits/5foldcv/tcga_blca/split_0.csv...split_4.csv; (2) tcga_blca/.pt (pt files of the graphs generate above). the directories and data are organized as follow: ostu_result/ |--tcga_blca ||--slide_1.pt ||--slide_2.pt ||... |--splits ||--5foldcv |||--tcga_blca ||||--splits_0.csv ||||--splits_1.csv ||||--splits_2.csv ||||--splits_3.csv ||||--splits_4.csv

Then I run patch-GCN using the following code: CUDA_VISIBLE_DEVICES=0,1 python main.py --data_root_dir "/mnt1/TCGA/ostu_result" --which_splits 5foldcv --split_dir tcga_blca --mode graph --model_type patchgcn . However, the error comes with the trace backs :Traceback (most recent call last): Traceback (most recent call last): File "main.py", line 221, in results = main(args) File "main.py", line 54, in main train_dataset, val_dataset = dataset.return_splits(from_id=False, File "/mnt1/Patch-GCN/datasets/dataset_survival.py", line 193, in return_splits train_split = self.get_split_from_df(all_splits=all_splits, split_key='train') File "/mnt1/Patch-GCN/datasets/dataset_survival.py", line 180, in get_split_from_df split = Generic_Split(df_slice, metadata=self.metadata, mode=self.mode, data_dir=self.data_dir, label_col=self.label_col, patient_dict=self.patient_dict, num_classes=self.num_classes) File "/mnt1/Patch-GCN/datasets/dataset_survival.py", line 287, in init with open(os.path.join(data_dir, 'fast_cluster_ids.pkl'), 'rb') as handle: FileNotFoundError: [Errno 2] No such file or directory: '/mnt1/TCGA/ostu_result/tcga_blca_20x_features/fast_cluster_ids.pkl'

I am just a beginner of python, and this error really confused me. I just preprocessed the WSI data and organized the directories and pt files as the instructions, however, do not created and even did not know how to create the tcga_blca_20x_features/fast_cluster_ids.pkl file or directory. I guess there must be something wrong with the data directory or dataset preparation process for the main.py program, I really do not know how to address it.

Would you please help me to address this error? Are there any detailed pipelines/examples/tutorials to guide me preparing the data and the associated directories for the mian.py program of Patch-GCN?

Richarizardd commented 2 years ago

Hi @liuxiaoping2020 - apologies for the delay. the file "tcga_blca_20x_features/fast_cluster_ids.pkl" is for benchmarking DeepAttnMISL. I am going to make a fix to the code soon so that it doesn't make using the fast_cluster_ids.pkl a requirement for running the code.

liuxiaoping2020 commented 2 years ago

Thank you very much for your reply, looking forward to your updated source code

Richarizardd commented 2 years ago

Hi @liuxiaoping2020 - sorry for the delayed reply. I was traveling this past month, and have been meaning to make time in following-up on this response. This error is due to fast_cluster_ids.pkl needed for DeepAttnMISL - which is not essential for this code repository. Let me know if there are more issues with this repository.

Luxiaowen45 commented 1 year ago

Hi, I'm a beginner of python, and I was wondering, CUDA_VISIBLE_DEVICES=0,1 python create_patches_fp.py --source /mnt1/TCGA/data --save_dir /mnt1/TCGA/ostu_result --patch_size 256 --preset tcga_ostu.csv --seg --patch --stitch ", where is the create_patches_fp.py file in this command? Hi, I'm a beginner of python, and I was wondering, CUDA_VISIBLE_DEVICES=0,1 python create_patches_fp.py --source /mnt1/TCGA/data --save_dir /mnt1/TCGA/ostu_result --patch_size 256 --preset tcga_ostu.csv --seg --patch --stitch ", where is the create_patches_fp.py file in this command? Also, isn't the data preprocessing (Otsu) code made public? And if so, where to find it? Also, isn't the data preprocessing (Otsu) code made public? And if so, where to find it?

mahmoodlab / Patch-GCN

Patch-GCN main.py do not work #9