mlfoundations / datacomp

DataComp: In search of the next generation of multimodal datasets
http://datacomp.ai/
Other
642 stars 54 forks source link

Dataset Size on Leaderboard #62

Closed brunnedu closed 1 year ago

brunnedu commented 1 year ago

Hey there,

While reviewing the leaderboard submissions for the small filtering track, I observed instances where the dataset size was noted as 1.3e7, which is essentially equivalent to the original dataset size without any filtering. Now I was wondering whether it was actually the case that these submissions kept almost all of the original data (which I doubt given submission titles such as BLIP2-COCO-finetuned_similarity_top-35%) or if this is an error.

Thanks for clarifying!

gabrielilharco commented 1 year ago

Hi @brunnedu. Thanks for the comment! My best guess is that the authors from those submissions forgot to correctly pass the flag specifying the dataset size