ys-zong / VL-ICL

Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning
https://ys-zong.github.io/VL-ICL/
28 stars 2 forks source link

Evaluation on Operator Induction is actually based on digits? #7

Closed kangzhiq closed 2 months ago

kangzhiq commented 2 months ago

Thanks for the great work!

I just noticed that the evaluation protocol of the Operator Induction dataset is not consistent with what has been presented in the paper. The dataset was designed to induce the operator. However, in your implementation, the model is asked to guess What is the result of the following mathematical expression? and the evaluation is also based on the output digit.

Could you please elaborate more on this? For me, it does not make sense to ask the model to guess the output of 2 ?3 and say the ground-truth is 6 instead of 5

Thanks in advance!

kangzhiq commented 2 months ago

Sorry, I just got the idea of operator induction: to infer the operator from the context and calculate the correct output.

Closing this issue.

ys-zong commented 2 months ago

Sorry, I just got the idea of operator induction: to infer the operator from the context and calculate the correct output.

Closing this issue.

Exactly, that's right.