jind11 / MedQA

Code and data for MedQA
MIT License
178 stars 16 forks source link

Human baseline #3

Open vlievin opened 2 years ago

vlievin commented 2 years ago

Hi! Is there a known human baseline for this dataset (open and closed book)? Or maybe the required score to pass the exam?

jind11 commented 2 years ago

Nope, but you can use 60 out f 100 score as a reference score, which is the passing score for the Med Exam.

vlievin commented 2 years ago

Thank you for the info! So, just to make sure we are aligned here: does that means 60% answering accuracy for the US, TW and MC datasets?

jind11 commented 2 years ago

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

miraculixx commented 2 months ago

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

Can you elaborate on your reasoning here please? It seems a somewhat flawed heuristic to assume human performance on this particular QA dataset [which to my understanding is generated?] would be approximately equivalent to the outcome of humans being tested in actual exams.