CARZero (Cross-attention Alignment for Radiology Zero-Shot Classification) is a pioneering multimodal representation learning framework designed for enhancing medical image recognition with minimal labeling effort. It leverages the power of cross-attention mechanisms to intelligently align disparate data modalities, facilitating label-efficient learning and accurate classification in the medical imaging domain.
CARZero Manuscript \ Haoran Lai, Qingsong Yao, Zihang Jiang. Rongsheng Wang, Zhiyang He, Xiaodong Tao, S. Kevin Zhou
University of Science and Technology of China
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
File names are saved in the Dataset
folder, which can be dowload in here Please update the PATH in the filename list to your image storage location for integration.
For the image encoder, Vit/B-16 is utilized, pre-trained using MAE and M3AE techniques. Our pre-trained models are available here for download in the ./pretrain_model
folder.
For text encoding, BioBert is fine-tuned on MIMIC and Padchest reports and available through Hugging Face:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Laihaoran/BioClinicalMPBERT")
model = AutoModel.from_pretrained("Laihaoran/BioClinicalMPBERT")
Start by installing PyTorch 1.12.1 with the right CUDA version, then clone this repository and install the dependencies.
$ conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
$ pip install git@github.com:laihaoran/CARZero.git
$ conda env create -f environment.yml
If you want to test the performance of our model, you can dowload the trained CARZero and place is in the ./pretrain_model
folder. Run the script file for test.
sh test.sh
Example configurations for pretraining can be found in the ./configs
. All training are done using the run.py
script. Run the script file for training:
sh train.sh
Note: The batch size increases sequentially from 64, to 128, and finally to 256. It is advisable to remember this progression and utilize a GPU equipped with 80 GB of memory to accommodate these changes efficiently.
@inproceedings{lai2024carzero,
title={Carzero: Cross-attention alignment for radiology zero-shot classification},
author={Lai, Haoran and Yao, Qingsong and Jiang, Zihang and Wang, Rongsheng and He, Zhiyang and Tao, Xiaodong and Zhou, S Kevin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={11137--11146},
year={2024}
}
Our sincere thanks to the contributors of MAE, M3AE, and BERT for their foundational work, which greatly facilitated the development of CARZero.