A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
I want to clarify the following statistic in the paper:
For the intra-scenario setting (where candidate descriptions for an image only come from the same scenario), there are 20,279 positive pairs and 53,383 negative pairs.
I listed all the image-text pairs with no target (within intra-scenario) and found a total of 29,067. Is the 53,383 referring to the number of object-sentence pairs rather than image-sentence pairs?
Thank you for the great work!
I want to clarify the following statistic in the paper:
I listed all the image-text pairs with no target (within intra-scenario) and found a total of 29,067. Is the 53,383 referring to the number of object-sentence pairs rather than image-sentence pairs?