suinleelab / MONET

Transparent medical image AI via an image–text foundation model grounded in medical literature
Other
34 stars 4 forks source link

Abut dataset #1

Closed SiyuanYan1 closed 2 months ago

SiyuanYan1 commented 1 year ago

Thanks for your great work! Can I ask whether you will release the image-text pair dataset? Also, can you provide some examples of what the caption looks like other than the example in Fig1?

chanwkimlab commented 1 year ago

Thanks for your interest in our work! We do not directly redistribute the image-text pairs from our training dataset. However, we released scripts for constructing the dataset; these can be found under scripts/preprocess. For the pubmed portion, the scripts can download the open-access pubmed articles, filter out non-dermatology images, and extract pairs of images and texts from scratch. For the textbook portion, given pdf files, the scripts can filter out non-dermatology images, match texts to images, and extract the pairs of images and texts from the files. The caption shown in Figure 1A is for demonstration purposes. In practice, the captions are just legends for the figures in the pubmed articles and textbook, and they also contain words that are not dermatology concepts.