Open jinxixiang opened 1 year ago
from left to right, the figures represent: text2image R1 R50 R200, image2text R1 R50 R200
This is very valid, and points to some form of leakage that is expected on BiomedClip, Thank you for the evaluations, will make sure to add a note to the readme in future updates!
Dear Author,
The ARCH dataset is divided into two subsets: the books_set and the pubmed_set.
I have noticed that the pubmed_set appears to overlap with BioMedCLip, which sources from PubMed Central.
In your paper, you combined these two datasets for cross-modality retrieval. However, I decided to separate them and compare their performance individually.
The retrieval performance on the pubmed_set was as follows: {15.7; 79.8; 94.4; 16.7; 78.9; 93.7}
Meanwhile, the retrieval performance on the books_set was: {7.3; 49.2; 74.2; 8.2; 49.7; 73.2}
In contrast, the performance of QUILT-GPT/77 showed different results:
The retrieval performance on the pubmed_set was: {1.8; 23.6; 46.0; 1.6; 23.4; 45.7}
The retrieval performance on the books_set was: {1.8; 27.7; 52.8; 1.5; 23.4; 46.4}
From these results, it's clear that there isn't as significant a domain gap between the two datasets as there is with BiomedCLIP.