Failure to replicate the CLIP concept generation experiment.

velvinnn commented 7 months ago

Thanks for your great work! I am trying to follow your steps and replicate the CLIP concept generation process on the Fitzpatrick17k split of the SkinCon dataset, but only get an AUROC of the 0.55. Could you please kindly explain at a high level if I did something wrong here?

Exclude any with less than 30 positive examples, use a prompt of 'This is {symptom}' for every symptom example.
For every image, re-sized and center-cropped to be 224x224 dimensions. It is then normalized using the mean and standard deviation used in CLIP
Use a pre-trained CLIP model from huggingface, here I tried (a). openai/clip-vit-large-patch14 (b). openai/clip-vit-large-patch14-336 (c). laion/CLIP-ViT-g-14-laion2B-s34B-b88K

Thank you in advance for any instructions!

chanwkimlab commented 7 months ago

Hi there! First off, thanks for your interest in our work! Please find my response to the points below.

Regarding the concept presence score, it's important to use reference prompts such as "This is skin photo" when calculating the score, as described in our paper. Also, we use multiple terms that represent the concept and then ensemble the model output.
That looks correct. More precisely, we used mean and standard from ImageNet pretraining, but I believe the difference's impact on the output would be minimal.
Yes, we indeed used a pre-trained CLIP model for benchmarking purposes. However, please note that for our main experiment, we used a CLIP model specifically trained on dermatology image-text pairs. so please ensure you've loaded the model weights properly.

Additionally, we provide a Jupyter notebook tutorial on the automatic concept annotation so that people can try out our method easily. Please try out this as well. You can find the tutorial here.

velvinnn commented 7 months ago

Many thanks for the detailed instructions! I will look into it.

suinleelab / MONET

Failure to replicate the CLIP concept generation experiment. #2