sagizty / BigModel

Repo for public Foundation models for pathological image analysis
1 stars 2 forks source link

Fundational model pipeline for WSI + ROI

Screenshot 2024-08-17 at 12 08 03 PM

This is an opensource learning pipeline containing the multiple fractions for WSI and ROI foundational models.

The licenses for the improted code follows their original code.

Install

On an NVIDIA A100 Tensor Core GPU machine, with CUDA toolkit enabled.

  1. Download our repository and open the path

    git clone https://github.com/sagizty/BigModel.git
    cd BigModel
  2. Install dependencies

conda env create -f environment.yaml
conda activate BigModel
pip install -e .
  1. Tile Cropping

    python Tiles_dataset.py \
    --WSI_dataset_path /data/hdd_1/BigModel/TCGA-LUAD-LUSC/TCGA-LUAD-raw \
    --tiled_WSI_dataset_path /data/hdd_1/BigModel/TCGA-LUAD-LUSC/tiles_datasets \
    --edge_size 224 \
    --target_mpp 0.5
  2. Tile Embedding

    python Embedded_dataset.py \
    --WSI_dataset_path /data/hdd_1/BigModel/TCGA-LUAD-LUSC/tiles_datasets \
    --embedded_WSI_dataset_path /data/hdd_1/BigModel/TCGA-LUAD-LUSC/slide_embeddings/gigapath \
    --model_name gigapath \
    --edge_size 224 \
    --PrefetchDataLoader_num_workers 10 \
    --batch_size 256
  3. Build MTL dataset for WSI

    python DownStream/MTL/slide_dataset_tools.py \
    --root_path /data/hdd_1/BigModel/embedded_datasets/TCGA-LUAD-LUSC-gigapath \
    --task_description_csv /home/workenv/PuzzleAI/Archive/dataset_csv/TCGA_Log_Transcriptome_Final.csv \
    --slide_id_key patient_id \
    --split_target_key fold_information \
    --task_setting_folder_name task-settings \
    --mode TCGA \
    --dataset_name luad-lusc
  4. Run MTL task with WSI MTL framwork

# Train
python DownStream/WSI_finetune/MTL_Train.py \
    --model_name gigapath \
    --root_path /data/ssd_1/CPIA_processed/embedded_datasets/TCGA-COAD \
    --local_weight_path /home/workenv/PuzzleAI/ModelWeight/prov-gigapath/slide_encoder.pth \
    --save_model_path /data/private/BigModel/saved_models \
    --runs_path /data/private/BigModel/runs \
    --task_description_csv /home/workenv/PuzzleAI/Archive/dataset_csv/TCGA_Log_Transcriptome_Final.csv \
    --task_setting_folder_name task-settings \
    --slide_id_key patient_id \
    --split_target_key fold_information \
    --num_epochs 100 \
    --warmup_epochs 10 \
    --intake_epochs 50

# Test
python DownStream/WSI_finetune/MTL_Test.py \
    --model_name gigapath \
    --root_path /data/ssd_1/CPIA_processed/embedded_datasets/TCGA-COAD \
    --save_model_path /data/private/BigModel/saved_models \
    --runs_path /data/private/BigModel/runs \
    --task_description_csv /home/workenv/PuzzleAI/Archive/dataset_csv/TCGA_Log_Transcriptome_Final.csv \
    --task_setting_folder_name task-settings \
    --slide_id_key patient_id \
    --split_target_key fold_information

# Decode the test results to csv
python Utils/Decode_correlation.py \
    --model_name gigapath \
    --root_path /data/ssd_1/CPIA_processed/embedded_datasets/TCGA-COAD \
    --runs_path /data/private/BigModel/runs \
    --WSI_tasks True \
    --task_setting_folder_name task-settings
  1. Run ROI level tasks
# todo need demo here
  1. Run ROI level SSL pretraining
# todo
  1. Run WSI level SSL pretraining
# todo
  1. Run WSI level VQA-tuning after pretraining
# todo
  1. Run WSI level VQA application
# todo
  1. Run ROI level VQA application
# todo