laihaoran / CARZero

Apache License 2.0
24 stars 3 forks source link

CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification

CARZero (Cross-attention Alignment for Radiology Zero-Shot Classification) is a pioneering multimodal representation learning framework designed for enhancing medical image recognition with minimal labeling effort. It leverages the power of cross-attention mechanisms to intelligently align disparate data modalities, facilitating label-efficient learning and accurate classification in the medical imaging domain.

CARZero Manuscript \ Haoran Lai, Qingsong Yao, Zihang Jiang. Rongsheng Wang, Zhiyang He, Xiaodong Tao, S. Kevin Zhou
University of Science and Technology of China
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Approach

CARZero

Dataset Overview

Training Data

Inference Data

Data Pre-processing

File names are saved in the Dataset folder, which can be dowload in here Please update the PATH in the filename list to your image storage location for integration.

Pretraining Model

For the image encoder, Vit/B-16 is utilized, pre-trained using MAE and M3AE techniques. Our pre-trained models are available here for download in the ./pretrain_model folder.

For text encoding, BioBert is fine-tuned on MIMIC and Padchest reports and available through Hugging Face:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Laihaoran/BioClinicalMPBERT")
model = AutoModel.from_pretrained("Laihaoran/BioClinicalMPBERT")

Getting Started

Start by installing PyTorch 1.12.1 with the right CUDA version, then clone this repository and install the dependencies.

$ conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
$ pip install git@github.com:laihaoran/CARZero.git
$ conda env create -f environment.yml

Zeroshot classification for mutil-label datasets (OpenI PadChest ChestXray14 CheXpert ChestXDet10)

If you want to test the performance of our model, you can dowload the trained CARZero and place is in the ./pretrain_model folder. Run the script file for test.

sh test.sh

Training

Example configurations for pretraining can be found in the ./configs. All training are done using the run.py script. Run the script file for training:

sh train.sh

Note: The batch size increases sequentially from 64, to 128, and finally to 256. It is advisable to remember this progression and utilize a GPU equipped with 80 GB of memory to accommodate these changes efficiently.

Citation

@inproceedings{lai2024carzero,
  title={Carzero: Cross-attention alignment for radiology zero-shot classification},
  author={Lai, Haoran and Yao, Qingsong and Jiang, Zihang and Wang, Rongsheng and He, Zhiyang and Tao, Xiaodong and Zhou, S Kevin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11137--11146},
  year={2024}
}

Acknowledgements

Our sincere thanks to the contributors of MAE, M3AE, and BERT for their foundational work, which greatly facilitated the development of CARZero.