mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
https://grounding-anything.com
766 stars 37 forks source link

How download images including saiapr_tc-12 under Refer_Segm folder? #9

Closed yliu-cs closed 9 months ago

LarsLiden commented 9 months ago

Same question. There doesn't seem to be a download link

image

LarsLiden commented 9 months ago

I believe that this is where they can be downloaded but confirmation would be helpful: https://www-i6.informatik.rwth-aachen.de/imageclef/resources/saiaprtc12/

yliu-cs commented 9 months ago

I believe that this is where they can be downloaded but confirmation would be helpful: https://www-i6.informatik.rwth-aachen.de/imageclef/resources/saiaprtc12/

Thanks! "Referring Expression Segmentation" may not require it; I was able to correctly evaluate on RefCoco without it.

hanoonaR commented 9 months ago

Hi @yliu-cs and @LarsLiden,

Thank you both for your interest in our work and for reaching out with your questions.

To clarify, for the RefCLEF portion of our project, the images from the saiapr_tc-12 dataset are indeed necessary. You can download these images from the following link: saiapr_tc-12 dataset.

Regarding the training process for referring expression segmentation, as outlined in our training doc, the model is primarily fine-tuned using the RefCOCO, RefCOCO+, and RefCOCOg datasets. The RefCLEF (saiapr_tc-12) dataset, is utilized for the demo model which uses a mixture of open-source datasets.

I hope this clears up any confusion. If you have any more questions or need further assistance, feel free to ask.