mahmoodlab / PathomicFusion

Fusing Histology and Genomics via Deep Learning - IEEE TMI
http://www.mahmoodlab.org
GNU General Public License v3.0
276 stars 81 forks source link
computational-pathogenomics fusion genomics histopathology mahmoodlab multimodal multimodal-network pathology pathomic transcriptomics

Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Diagnosis and Prognosis

Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis, IEEE Transactions on Medical Imaging, 2020. [HTML] [arXiv] [Talk]
Richard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood
```bash @article{chen2020pathomic, title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis}, author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal}, journal={IEEE Transactions on Medical Imaging}, year={2020}, publisher={IEEE} } ```

Summary: We propose a simple and scalable method for integrating histology images and -omic data using attention gating and tensor fusion. Histopathology images can be processed using CNNs or GCNs for parameter efficiency or a combination of the the two. The setup is adaptable for integrating multiple -omic modalities with histopathology and can be used for improved diagnostic, prognostic and therapeutic response determinations.

Community / Follow-Up Work :)

GitHub Repositories / Projects

Updates

Setup

Prerequisites

Code Base Structure

The code base structure is explained below:

The directory structure for your multimodal dataset should look similar to the following:

./
├── data
      └── PROJECT
            ├── INPUT A (e.g. Image)
                ├── image_001.png
                ├── image_002.png
                ├── ...
            ├── INPUT B (e.g. Graph)
                ├── image_001.pkl
                ├── image_002.pkl
                ├── ...
            └── INPUT C (e.g. Genomic)
                └── genomic_data.csv
└── checkpoints
        └── PROJECT
            ├── TASK X (e.g. Survival Analysis)
                ├── path
                    ├── ...
                ├── ...
            └── TASK Y (e.g. Grade Classification)
                ├── path
                    ├── ...
                ├── ...

Depending on which modalities you are interested in combining, you must: (1) write your own function for aligning multimodal data in make_splits.py, (2) create your DatasetLoader in data_loaders.py, (3) modify the options.py for your data and task. Models will be saved to the checkpoints directory, with each model for each task saved in its own directory. At the moment, the only supervised learning tasks implemented are survival outcome prediction and grade classification.

Training and Evaluation

Here are example commands for training unimodal + multimodal networks.

Survival Model for Input A

Example shown below for training a survival model for mode A and saving the model checkpoints + predictions at the end of each split. In this example, we would create a folder called "CNN_A" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "A" is defined as a mode in dataset_loaders.py for handling modality-specific data-preprocessing steps (random crop + flip + jittering for images), and that there is a network defined for input A in networks.py. "surv" is already defined as a task for training networks for survival analysis in options.py, networks.py, train_test.py, train_cv.py.

python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode A --model_name CNN_A --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0

To obtain test predictions on only the test splits in your cross-validation, you can replace "train_cv" with "test_cv".

python test_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode input_A --model input_A_CNN --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0

Grade Classification Model for Input A + B

Example shown below for training a grade classification model for fusing modes A and B. Similar to the previous example, we would create a folder called "Fusion_AB" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "AB" is defined as a mode in dataset_loaders.py for handling multiple inputs A and B at the same time. "grad" is already defined as a task for training networks for grade classification in options.py, networks.py, train_test.py, train_cv.py.

python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task grad --mode AB --model_name Fusion_AB --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0

Reproducibility

To reporduce the results in our paper and for exact data preprocessing, implementation, and experimental details please follow the instructions here: ./data/TCGA_GBMLGG/. Processed data and trained models can be downloaded here.

Issues

Licenses, Usages, and Acknowledgements

@article{chen2020pathomic,
  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
  journal={IEEE Transactions on Medical Imaging},
  year={2020},
  publisher={IEEE}
}

© Mahmood Lab - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.