ibrahimethemhamamci / CT-CLIP

Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
197 stars 21 forks source link

Error while reproducing the project #11

Closed guohahah closed 7 months ago

guohahah commented 7 months ago

Hello! I want to try using your zero-shot model on my own data, but I met some problems when I run run_zero_shot.py, with the pre-training .pt file set up, I set the

data_folder = '/dataset_metadata_validation_metadata.csv', reports_file= "dataset_radiology_text_reports_validation_reports.csv", labels = "dataset_multi_abnormality_labels_valid_predicted_labels.csv",

which these three .csv file downloaded from your huggingface dataset. However, it seems the dataloader in zero_shot.py cannot right read the data and throws this error as bellow:

$ CUDA_VISIBLE_DEVICES=2 python run_zero_shot.py /.conda/envs/ct_clip/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /.conda/envs/ct_clip/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) 0it [00:00, ?it/s] Traceback (most recent call last): File "run_zero_shot.py", line 43, in inference = CTClipInference( File "/CT-CLIP-main/scripts/zero_shot.py", line 179, in init self.dl = DataLoader( File "/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 351, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] File "/.local/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 107, in init raise ValueError("num_samples should be a positive integer " ValueError: num_samples should be a positive integer value, but got num_samples=0

How to solve this problem? I'm not sure if there are errors in any of the settings, and if I want to use the model directly to diagnose new CT cases, is it just a matter of running run_zero_shot.py as I'm currently doing?

sezginerr commented 7 months ago

Hello @guohahah, thank you very much for your interest!

The data_folder variable should point to the directory containing the preprocessed images, rather than the metadata CSV file. Initially, please download the validation volumes using the provided script available here (download_only_valid_data.py): https://github.com/sezginerr/example_download_script.

To access the data from Hugging Face, you must agree to the terms and conditions (I believe you have already done this as you downloaded the csv files), obtain a personal token (from the settings of huggingface), and then set this token within the script to initiate the data download process. This will retrieve the validation dataset for you.

Subsequently, execute the preprocessing script found here: https://github.com/ibrahimethemhamamci/CT-CLIP/tree/main/data_preprocess. The usage instructions for these scripts are detailed in the provided link.

Following preprocessing, ensure to update the data_folder variable to reflect the directory path (or symbolic link) of the preprocessed volumes.

I hope these instructions are clear. Feel free to reach out if you have any further questions or require assistance.

guohahah commented 7 months ago

Thank you very much for your detailed reply, I'll give it a try!

sezginerr commented 7 months ago

Hi @guohahah, I am closing the issue for now. You can reopen it if you have further questions.