Per-dataset results of 21 standard zero-shot datasets

ytaek-oh / vl_compo

8 stars 0 forks source link

Per-dataset results of 21 standard zero-shot datasets #1

Closed vishaal27 closed 3 days ago

vishaal27 commented 4 days ago

Hey, thanks for your great work and publicly releasing your results and code. I am very interested in obtaining the individual per-dataset (non-compositionality) results for all the models in your results.csv file. Would you please be able to release the individual model scores on each of the 21 datasets independently? That would be awesome, thanks in advance!

ytaek-oh commented 3 days ago

Hello Vishaal, Thank you for your interest in our work! You can find the detailed evaluation results for all tasks in [individual_results.csv], including 21 classification tasks.

For each benchmark in columns, :: serves as a delimiter for scores within individual benchmarks. For Winoground-style datasets that include sub-tasks (e.g., eqben and mmvp_vlm), an additional :: separates 'text', 'image', and 'group' scores.

Codes will be uploaded upon upload approval soon.

Thanks,

vishaal27 commented 3 days ago

thanks so much, this is great!