Open Violettttee opened 2 weeks ago
Hi, @Violettttee , You can try the AI2D_TEST_NO_MASK dataset we provided, which generally display better performance compared to AI2D_TEST due to the different setting. However, we still cannot reproduce the numbers reported by OpenAI or Anthropic.
您好~ 想请问下你们对于openai和claude3.5在ai2d上特别高的分数有任何建议和想法吗?我这边修改姿势和prompt(添加cot)评测了gpt多次,都无法复现出0.942的超高分数。(加了cot后的最高分也就0.83),想请问你们对于这个gap有什么想法?(我看你们这边的ai2d的评测分数也没有任何高于0.9以上的,很好奇claude和gpt是怎么测出来将近满分的