Does the top-two captions per image included in BLIP2 training benchmark datasets?

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

BSD 3-Clause "New" or "Revised" License

9.79k stars 961 forks source link

Does the top-two captions per image included in BLIP2 training benchmark datasets? #354

Open inkzk opened 1 year ago

inkzk commented 1 year ago

According to the BLIP2 paper:

We adopt the CapFilt method (Li et al., 2022) to create synthetic captions for the web images... We keep top-two captions per image as training data and randomly sample one at each pre-training step.

Thus I wonder wether the top-two captions included in the training dataset of benchmark link, Further more, the LAION400M dataset not found in benchmark link, any idea?

LiJunnan1992 commented 1 year ago

We have not yet released the synthetic captions for LAION. You may consider to generate them yourself if needed. Thanks.

inkzk commented 1 year ago

We have not yet released the synthetic captions for LAION. You may consider to generate them yourself if needed. Thanks.

Thanks for reply. Does the synthetic captions for others datasets released?