nadeemlab / Physics-ArX

Physics-based data augmentation library for quantifying CT and CBCT images in radiotherapy [PMB'23, PMB'21, Medical Physics'21, AAPM'21]
56 stars 16 forks source link
artifacts-generation computed-tomography cone-beam-ct data-augmentation deep-learning motion-simulation physics-simulation radiology radiotherapy

Physics-ArX: Physics-based data augmentation for CT and CBCT quantification in radiotherapy

Physics-ArX python library provides an easy-to-use physics-based data augmentation pipeline for inducing realistic cone-beam CT (CBCT) artifacts on high-quality planning CT images, which results in perfectly paired/registered 3D synthetic CBCT (sCBCT) and planning CT (+ segmentation/dose) data. The motivation behind Physics-ArX library is to create a computationally cheap and deterministic pipeline to deteriorate high-quality planning CT images to simulate realistic CBCT image variations while leveraging high-quality gross tumor volume (GTV) and organs-at-risk (OARs) segmentations to train robust and generalizable fully-supervised deep learning algorithms (this is in contrast to prior approaches that rely on unpaired CT-CBCT data and error-prone/hallucinating unsupervised deep learning algorithms). Specifically, the scatter artifacts are extracted and transferred from corresponding baseline (pre-treatment) CBCT image whereas the motion artifacts are derived from corresponding 4D CT image (both baseline CBCT and 4D CT images are easily accessible/available in non-small cell lung cancer radiotherapy settings).

For preliminary data, we have shown that segmentation models trained on our Physics-ArX augmentated data can generalize across CT and CBCT images (PMB'21). Moreover, since Physics-ArX results in large number of perfecly-paired/registered planning CT and sCBCT variations, for the first time we also created 3D multitask CBCT-to-CT translation and OAR segmentation deep learning models that can quantify all daily/weekly CBCTs acquired during radiotherapy (Medical Physics'21). Finally, we also demonstrated a new motion artifact induction pipeline for CT and CBCT images using 4D CT data in our AAPM'21 abstract. We have also released our SeqX2Y model which allows generation of physically realistic 4D CT data from static 3D CT images (PMB'23); this SeqX2Y model can be used in tandem with the AAPM'21 work to generate motion artifacts on CT and CBCT images.

Updates:

Multitask 3D CBCT-to-CT Translation and Organs-at-Risk Segmentation Using Physics-Based Data Augmentation

Read Link

Weekly cone-beam computed tomography (CBCT) images are primarily used for patient setup during radiotherapy. To quantify CBCT images, we present a 3D multitask deep learning model for simultaneous CBCT-to-CT translation and organs-at-risk (OARs) segmentation driven by a novel physics-based artifact/noise-induction data augmentation pipeline. The data augmentation technique creates multiple paired/registered synthetic CBCTs corresponding to a single planning CT which in turn can be used to translate real weekly CBCTs to better quality CTs while performing OAR segmentation using the high-quality planning CT contours. Given the resultant perfectly-paired CBCT and planning CT/contours data, we use supervised conditional generative adversarial network as the base model which, unlike CycleGAN -- prevalent in CBCT-to-CT translation literature -- and other unsupervised image-to-image translation methods, does not hallucinate or produce randomized outputs. We also use a large 95 patient lung cancer dataset with planning CT and weekly CBCTs. overview_imageFigure. Entire workflow of generating different variations of perfectly paired pseudo CBCT variations from single deformable registered planning CT and week1 CBCT pair. The generated data is used to train a multitask 3D pix2pix model to generate synthetic CT and OAR segmentations.

Prerequisites:

NVIDIA GPU (Tested on NVIDIA RTX 2080 Ti)
CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification)
Python 3
Pytorch>=0.4.0
torchvision>=0.2.1
dominate>=2.3.1
visdom>=0.1.8.3

Dataset:

All images must be 128x128x128 in size (DxHxW). During training the network takes a 128x128x128 CBCT image as input and produces the corresponding synthetic CT and a synthetic multi-label OAR segmentation. During training, to compute the loss functions, it also needs the real CT and real multi-label OAR segmentation. So, all images are converted into numpy (*.npz) images which can store multiple arrays as dictionaries. The dictionaries keys are 'CT', 'CBCT' and 'RTSTRUCTS'. Multiple RT structures (OAR segmentations), are converted into single multi-label image. We trained our models with Esophagus label = 4, spinal cord = 3, heart = 2 and lungs = 1.

For testing purpose, for any unavailable image in the pairing set, just put the CBCT image or a blank image in the final paired image in place of the missing image in the pair. Some example python scripts are provided in 'preprocess' subdirectory as examples of preprocessing needed to produce training/testing data. Data needs to be arranged in the following order:

datasets/XXX_Dataset 
    ├── test
    ├── val
    └── train

Training:

To train a model:

python train.py --dataroot ./datasets/XXX_Dataset
                --netG unet_128 
                --name Experiment_Name
                --stabilize 
                --input_nc 1 
                --output_nc 2 
                --model pix2pix3d 
                --direction BtoA 
                --lambda_L1 100 
                --dataset_mode cbct2ct3d 
                --norm batch --batch_size 1 
                --display_port 8097 
                --lr 0.0001 
                --display_freq 1000 
                --print_freq 1

More options?

You can find more options in:

Public Dataset:

The best model in our paper was trained with a combination of our internal Memorial Sloan Kettering Cancer Center (MSKCC) data and AAPM Thoracic Auto-segmentation Challenge data (http://aapmchallenges.cloudapp.net/competitions/3). Unfortunately, MSKCC data cannot be publicly released due to data privacy concerns. However, AAPM data is available publicly. We have augmented the AAPM data (which only contained plan CT and OAR segmentations) using our physics-based augmentation techniques followed by geometric (rotation/scale/shear/noise) augmentations to generate perfectly paired pCT/psCBCT/OAR pairs which can be used for training/testing. This augmented data is available for download here. The data needs to be converted to numpy (*.npz) format for training/testing using instructions above.

An example dataset is generated using ./preprocess/prep_test_data_pseudo_cbct_3D.py. The npz file is saved in ./datasets/psAAPM/test folder and the resulting synthetic CT and OAR segmentation, after running the testing script using the trained model, is saved in ./results/msk_aapm_stabilized_eso4/test_latest/npz_images

Physics-based Augmentation Pipeline:

The multitask CBCT to CT translation and OAR segmentation models are based on a physics-based artifact/noise induction data augmentation pipeline. Broadly, this is a methodology to extract different types of noise artifacts from CBCT images and adding these noise artifacts to planning CT (pCT) images to generate several synthetic pseudo-CBCT (psCBCT) images from a single registered pair of CT-CBCT images. Following figure shows example of a week 1 CBCT image deformably registered to a corresponding pCT image. Several types of noise artifacts are extracted from the registered week 1 CBCT, added to the corresponding pCT and followed by reconstruction using OS-SART technique leading to the generation of several pseudo-CBCT's from a single case. psCBCTs_imageFigure. Example slices of five different variations of psCBCT images generated by transferring artifacts/noise from week 1 CBCT to pCT and reconstructing using OS-SART technique. The five different variations correspond to different parameters in the power-law adaptive histogram equalization (PL-AHE) technique. The organs-at-risk masks manually segmented on pCT images are perfectly paired to all different psCBCT variations as well. These synthetic CBCT/CT are further augmented with geometric augmentations leading to creation of large database of perfectly paired pCT/psCBCT images from a small set of registered pCT and week1 CBCT images. This physics-based augmentation process is described in our separate paper: Generalizable cone beam CT esophagus segmentation using physics-based data augmentation link . There are several steps in this process and we provide code with data samples for all steps.

Artifact extraction:

The first step is to extract artifacts from deformably registered week1 CBCT using Power Law Adaptive Histogram Equilization. Code for this step is in the following folder:

Physics-based-Augmentation/Artifact-Induction 

The code was built and tested on Ubuntu 20.04 but the steps should be easily reproducible on other linux based systems. It requires ITK Insight Toolkit link and cmake-gui. Steps:

OS-SART based CBCT image reconstruction:

Artifact induced Pseudo-CBCT images are reconstructed using the open source TIGRE libraries. The codes/installation was tested in Anaconda virtual environment. Sample python script to reconstruct pseudo CBCT images is provided in subdirectory:

Physics-based-Augmentation/OSSART-Reconstruction 

Prerequisites:

NVIDIA GPU (Tested on NVIDIA RTX 2080 Ti)
CUDA CuDNN: cudatoolkit-dev
Python 3.7
numpy
cython
simpleitk (conda install -c simpleitk SimpleITK)

Geometric Augmentation:

The psCBCT images are further enhanced with geometric augmentations using imgaug library. Install the following prerequisities in an anaconda environment.

Python 3.7
conda install -c conda-forge imgaug
conda install -c conda-forge pynrrd
conda install scikit-image

Create custom dataset using Physics-ArX

If you have access to your own planning CT images and corresponding CBCTs, you can perform the following steps to create your own physics-augmented dataset. Note, this pipeline will work best if you have baseline CBCTs (acquired within 1-2 days of planning CT acquisition to ensure no change in tumor or the surrounding anatomy). Assuming you don't have significant morphological change between your planning CT and week 1 CBCT image, you will first (1) run deformable image registration (we used LDDMM) via the ANTs python library, (2) crop the registered planning CT images to that of the CBCT field-of-view, (3) now you will use the scripts (given in this repository) to (a) extract the scatter artifacts from the registered CBCT images via Adaptive Histogram Equalization, (b) add these artifacts to the registered/cropped planning CT images, (c) perform OS-SART CBCT reconstruction on these artifact-induced planning CTs via the TIGRE library (with parameter variations given in this repository), and finally (d) perform additional geometric augmentations. This should give you the physics-augmented dataset to drive your deep learning pipelines for different tasks including artifact correction, image registration, image segmentation, longitudinal image prediction, etc.

Stitching a full 512 x 512 image

Although the input for the network is 128 cubes it can still be used for larger image volumes. For many machines the standard CBCT output is 512x512x(~128). To do this I used high resolution data, low resolution data, and a normalization method to insure model outputs were scaled the same. It is important to normalize as the model's output can be variable leading to slitching artifacts.

First, overlapping low resolution images were stitched together. Images were cropped to the volume of interest which was about 256x400x128. This image was then downsampled by half in the x and y directions to get an image of 128x200x128. A uniform kernel with strides of 10 voxels in the y direction was used to create images for stitching. These images were saved and input into the model. The output of the model was normalized by insuring that the overlap between consecutive windows had the same mean and standard deviation. The mean of all outputs for a given region of the images was taken to give a low resolution image

Second, images were stitched together at the CBCT resolution. The volume of interest was again cropped to 256x400x128. A uniform kernel with strides of 16 in the x and y directions was used to create the images for stitching. The images were stitched by matching the mean and standard deviation of each high res image to the same region of the low resolution image. In this way a high resolution 512x512x128 image was formed as can be seen below:

stitched_image

Figure. Stitched image from CBCT lung patient to sCT for a 512x512x128 volume.

Issues

Please report all issues on the public forum.

License

© Nadeem Lab - This code is distributed under Apache 2.0 with Commons Clause license, and is available for non-commercial academic purposes.

Acknowledgments

Reference

If you find our work useful in your research or if you use parts of this code please consider citing our paper:

@article{Dahiya2021Multitask3C,
  title={Multitask 3D CBCT-to-CT Translation and Organs-at-Risk Segmentation Using Physics-Based Data Augmentation},
  author={N. Dahiya and S. Alam and Pengpeng Zhang and Si-Yuan Zhang and A. Yezzi and S. Nadeem},
  journal={Medical Physics},
  year={2021},
}

@article{alam2021generalizable,
  title={Generalizable cone beam CT esophagus segmentation using physics-based data augmentation},
  author={Alam, Sadegh R and Li, Tianfang and Zhang, Pengpeng and Zhang, Si-Yuan and Nadeem, Saad},
  journal={Physics in Medicine \& Biology},
  volume={66},
  number={6},
  pages={065008},
  year={2021},
  publisher={IOP Publishing}
}

@article{alam2021motion,
  title={Physics-based motion artifact simulation in CT/CBCT images using 4DCT phases},
  author={Alam, Sadegh R and Li, Tianfang and Zhang, Si-Yuan and Zhang, Pengpeng and Nadeem, Saad},
  journal={AAPM'21 abstract},
  year={2021}
}

@article{lee2023rmsim,
  title={RMSim: controlled respiratory motion simulation on static patient scans},
  author={Lee, Donghoon and Yorke, Ellen and Zarepisheh, Masoud and Nadeem, Saad and Hu, Yuchi},
  journal={Physics in Medicine and Biology},
  volume={68},
  issue={4},
  pages={045009}
}