Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification
Project Info
This project is the official implementation of MMIL-Transformer proposed in paper Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification
News
A new grouping method(MSA grouping) and the corresponding pre-trained weights will be updated soon.
Prerequisites
- Python 3.8.10
- Pytorch 1.12.1
- torchmetrics 0.4.1
- CUDA 11.6
- numpy 1.24.2
- einops 0.6.0
- sklearn 1.2.2
- h5py 3.8.0
- pandas 2.0.0
- nystrom_attention
- argparse
Pretrained Weight
All test experiments were conducted 10 times to calculate the average ACC and AUC.
| model name | grouping method | weight | ACC | AUC |
|------------|-----|:------:|----|----|
| `TCGA_embed`|Embedding grouping|[HF link](https://huggingface.co/RJKiseki/MMIL-Transformrt/blob/main/TCGA_embed.pt) | 93.15% | 98.97% |
| `TCGA_random`|Random grouping|[HF link](https://huggingface.co/RJKiseki/MMIL-Transformrt/blob/main/TCGA_random.pt) | 94.37%| 99.04% |
| `TCGA_random_with_subbags_0.75masked`|Random grouping + mask|[HF link](https://huggingface.co/RJKiseki/MMIL-Transformrt/blob/main/TCGA_random_mask_0.75.pt) | 93.95%| 99.02% |
| `camelyon16_random`|Random grouping|[HF link](https://huggingface.co/RJKiseki/MMIL-Transformrt/blob/main/camelyon16_random.pt) | 91.78% | 94.07% |
| `camelyon16_random_with_subbags_0.6masked`| Random grouping + mask|[HF link](https://huggingface.co/RJKiseki/MMIL-Transformrt/blob/main/camelyon16_mask_0.6.pt) | 93.41% | 94.74% |
Usage
Dataset
Preprocess TCGA Dataset
We use the same configuration of data preprocessing as DSMIL. Or you can directly download the feature vector they provided for TCGA.
Preprocess CAMELYON16 Dataset
We use CLAM to preprocess CAMELYON16 at 20x.
Preprocessed feature vector
Preprocess WSI is time consuming and difficult. We also provide processed feature vector for two datasets. Aforementioned works DSMIL and CLAM
greatly simplified the preprocessing. Thanks again to their wonderful works!
| Dataset | Link | Disk usage |
|------------|:-----:|----|
| `TCGA`|[HF link](https://huggingface.co/datasets/RJKiseki/TCGA/tree/main)| 16GB |
| `CAMELYON16`|[HF link](https://huggingface.co/datasets/RJKiseki/CAMELYON16/tree/main)|20GB|
Test the model
For TCGA testing:
python main.py \
--test {Your_Path_to_Pretrain} \
--num_test 10 \
--type TCGA \
--num_subbags 4 \
--mode {embed or random} \
--num_msg 1 \
--num_layers 2 \
--csv {Your_Path_to_TCGA_csv} \
--h5 {Your_Path_to_h5_file}
For CAMELYON16 testing:
python main.py \
--test {Your_Path_to_Pretrain} \
--num_test 10 \
--type camelyon16 \
--num_subbags 10 \
--mode random \
--num_msg 1 \
--num_layers 2 \
--csv {Your_Path_to_CAMELYON16_csv}\
--h5 {Your_Path_to_h5_file}