Closed jacob-rosenthal closed 1 year ago
Figure 1 is a summary of the modalities mentioned in the caption of the image, you can download the JSON file from PMC-OA data https://huggingface.co/datasets/axiong/pmc-oa to get the corresponding caption, and then filter ultrasonography figures. Note that for the image name in PMC-OA, like "PMC212319_Fig3_4.jpg", in our dataset is "PMC212319_Fig3.jpg". (as pmc-oa are noncompound figures, and our first version is compound figure)
Hi, thanks for making this public! I am wondering if there is a way to take subsets of the dataset by image types, like what you show in Figure 1 of the paper. For example, getting only the question-answer pairs for ultrasonography figures. Looking at the dataset on Huggingface, I can't see any columns that contain figure type labels. Thanks!!