ibrahimethemhamamci / CT-CLIP

A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities
152 stars 16 forks source link

Question about validation set #19

Open DwanZhang-AI opened 2 months ago

DwanZhang-AI commented 2 months ago

Hi there,

We notice that the number of the internal validation dataset you mention in the figure has 1564 scans, while your proposed validation set in HF and in the paper has 1304 cases/3039 scans. Could you tell me which subset you use to validate your model?

Thanks

sezginerr commented 2 months ago

Hi @DwanZhang-AI,

The validation split is provided in the Huggingface repository. There are 1304 patients, each of whom has at least one scan (but possibly multiple). There are 1564 scans in total. Each scan has different DCM series for different reconstructions, due to the different kernels used in the preprocessing step. Thus, the number of reconstructions is 3039. I hope this makes it clearer!

xychen2022 commented 1 month ago

Hi @DwanZhang-AI,

The validation split is provided in the Huggingface repository. There are 1304 patients, each of whom has at least one scan (but possibly multiple). There are 1564 scans in total. Each scan has different DCM series for different reconstructions, due to the different kernels used in the preprocessing step. Thus, the number of reconstructions is 3039. I hope this makes it clearer!

Hi, I was using the provided pretrained weights to reproduce the results in table 1. However, my reproduced results didn't match what you reported. Could you please let me know where I possibly did wrong? BTW, I simply took the average of AUROCs of each image and then averaged across all images.

sezginerr commented 1 month ago

Hi @xychen2022, we will update the values shortly in the second version of the preprint. There was a problem with the spacings that we have discussed and solved. It should be very similar though (with scores a little bit higher than the reported ones). See our discussion here: https://huggingface.co/datasets/ibrahimhamamci/CT-RATE/discussions/58

xychen2022 commented 1 month ago

@sezginerr I have noticed this and made corrections to the training and validation data based on the provided metadata. Are the pretrained weights based on wrong spacings?

sezginerr commented 1 month ago

No, the trainings were done with the correct spacings (therefore the weights are correct). We will update the weights nevertheless to make them more memory efficient in the coming days (possible 1-2 weeks). This issue occurred when we cleaned out the data structure to make it more organized before publishing the dataset. We put the validation values from the cleaned version into the preprint, where the validation spacings were not correct in some instances. We have fixed this issue in the repository but have not yet updated the values in the preprint. So shortly, weights are correct :).

xychen2022 commented 1 month ago

What do you mean by "update the weights nevertheless to make them more memory efficient"? I assumed you will not change the network architecture. BTW, is my way of calculation correct and the same as how you did? Hope you can also make it more clear for us on 1304 patients, 1564 scans and 3039 reconstructions in your next update. What are the difference between those 1564 scans and 3039 reconstructions? Did you evaluate on a subset of data? Thanks!

sezginerr commented 1 month ago

The network architecture might be updated slightly if everything works with the same accuracy with better memory usage. We are currently experimenting on this actually so I cannot say this for sure unfortunately at least for the next few days.

Regarding the numbers I could not get what you mean. Numbers should be correct regardless of the spacing issue. 1304 patient have 1564 CT scans. Each CT scans are reconstructed with different kernels (at least 1 but can be more than one for different windowings) which gives 3039 reconstructions in total. Do you see any problem with this? We did not experiment with the subset of the data, our validation dataset is exactly same with the paper.

xychen2022 commented 1 month ago

@sezginerr Thanks for the clarification. I was confused by scan and reconstruction since I am not an expert on this.

xychen2022 commented 1 month ago

@sezginerr aurocs.xlsx

I tested the model performance with the shared weights, but the results I obtained are worse than those reported in the preprint. I have attached the output file ‘aurocs.xlsx’ obtained with ‘CT_VocabFine.pt’. Could you please let me know how it compares to your results and how you calculated the overall performance? Did you simply average the numbers? Thanks!