MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AG - Githubissues

paperswithlove / papers-we-read

3 stars 0 forks source link

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AG #36

Open runhani opened 6 months ago

runhani commented 6 months ago

역시 original을 먼저 읽었어야 했나?

총 11,550 문제를 모았다.
분야는 6가지 대분야에서 30개 대학교 과목 183 세부 과목 30개의 image type
1. Art & Design, Business
2. Science
3. Health & Medicine
4. Humanities & Social
5. Science
6. Tech & Engineering.
diagram, table, plot, chart, 화학식, 의료영상, 현미경 이미지등 다양한 이미지 type
Expert AGI달성 여부를 monitoring하는 것은 중요하구나...
역시나 결론은 siginificant room for improvement.
평균 점수 : GPT-4V 56점, Gemini Ultra 59점, 그런데 ... GPT-4o 69점... (ㅎㄷㄷ)

이미지 한개에 문제 한개가 있는 것 뿐만 아니라 interleaved 문제도 포함

그래서 data를 어떻게 모았다고?

대학생 50명을 모았다.
교과서랑 온라인에서 문제를 모았고 필요하면 새로운 문제도 만들었다.

stat

	total	art & design	business	science	health & medicine	human & social sci.	tehc & eng.
validation	900
test	10500	1163	1428	2426	1752	947	2784

conclusions

MMMU에서 성능이 좋다고 AGI라고 말할 수는 없지만
Expert AGI라면 MMMU에서 strong performance를 달성할 필요가 있다.