Question about data overlap

zamling / PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Apache License 2.0

187 stars 9 forks source link

Question about data overlap #16

Open lichengshen opened 2 months ago

lichengshen commented 2 months ago

Hello, thanks for the great work.

I noticed that the RefCOCO val/test sets use images from the COCO training set. When doing joint training, I think this could cause a data leak, that the testing images and masks for RefCOCO are seen when training on COCO-Panoptic. Is this true, or have you handled this somewhere?

Z-MU-Z commented 2 months ago

Hello, I noticed that LISA mentioned in the paper 'we exclude the COCO samples whose images are present in the refCOCO(+/g) validation sets during training.' However, it seems that I didn't find this implemented in the code. Did you notice anything regarding this?"

zamling commented 2 months ago

Hi, all

I did not notice this problem. We built our dataset based on LISA and UNINEXT. RefCOCO/+/g and COCO train2017 are built from LISA, so I do not know whether such data leak has been dealed well. So can you provide me some codes in LISA or UNINEXT about this processing? So that I can check wheter I follow it correctly.