Open ShunsukeOnoo opened 1 year ago
I'm also testing the zero-shot reasoning capability of kosmos-2 and its not as promising as I read from the kosmos1 paper. Would you mind sharing your code on this evaluation on the CUB dataset so I can replicate more zero-shot experiments? Thank you very much.
Description
I am engaged in research with Kosmos-2, aiming to replicate the Zero-Shot Image Classification with Descriptions task as detailed in Section 4.7 of the Kosmos-1 paper (figure). Unfortunately, I'm encountering challenges in matching the performance outcomes reported for Kosmos-1. The absence of published performance data for Kosmos-2 on this task leaves me uncertain whether the observed discrepancies stem from model variations or my implementation approach.
Inquiries
Experimentation Details
For the replication study, I've created a dataset analogous to the one described in the Kosmos-1 paper, using the CUB dataset from Huggingface. My evaluation focuses on woodpecker and sparrow pairs, adopting descriptions from Table 11 of the Kosmos-1 paper. The penguin pair was excluded due to its absence in the dataset. My evaluation criterion measures the model's accuracy in text generation where the initial species name aligns with the actual name or acceptable variations thereof, factoring in punctuation. Notably, the accuracy was found to be 71.7% without descriptions and 61% with descriptions, which contrasts with the trends reported for Kosmos-1.
I would be grateful for any support you can provide and eagerly await your guidance.