Closed chencjGene closed 4 years ago
Hi, Changjian,
Actually, only evaluation requires bounding boxes (e.g., Flickr does not have bounding box annotations). You can check conceptcaption, which is a large dataset with paired image-caption annotation. If you are asking for an evaluation dataset, maybe you can take a look at visual genome VG), which has bounding boxes and captions. However, VG's captions are describing regions instead of the whole image.
On Fri, Oct 23, 2020 at 10:15 AM Changjian Chen notifications@github.com wrote:
Hi,
To train and evaluate Cap2Det, datasets with both bounding box annotations and captions are needed (like COCO and flickr30K). I wonder if there are any other datasets like these two?
Best,
Changjian
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yekeren/Cap2Det/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPAIFUL3J26KATROSSITSMGFYHANCNFSM4S4TWUNQ .
-- Thanks, best regards.
Keren
Hi keren,
Thanks. Yeah, I am looking for datasets for evaluation. Apart from visual genome, are there other datasets that have bounding boxes and captions describing the whole image?
There are only a few datasets that meet your requirements. Maybe you can mimic the way processed in the visual commonsense reasoning - use an off-the-shelf object detector to provide the bounding box annotations for evaluation.
On Fri, Oct 23, 2020 at 10:16 PM Changjian Chen notifications@github.com wrote:
Hi keren,
Thanks. Yeah, I am looking for datasets for evaluation. Apart from visual genome, are there other datasets that have bounding boxes and captions describing the whole image?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/yekeren/Cap2Det/issues/27#issuecomment-715657671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6CPAPVXJKTNMK72UBDKRTSMI2IJANCNFSM4S4TWUNQ .
-- Thanks, best regards.
Keren
Got it! Many thanks!
Hi,
To train and evaluate Cap2Det, datasets with both bounding box annotations and captions are needed (like COCO and flickr30K). I wonder if there are any other datasets like these two?
Best,
Changjian