modelscope / evalscope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Apache License 2.0
165 stars 24 forks source link

HallusionBench数据集的"aAcc","fAcc","qAcc"指标含义 #104

Closed stay-leave closed 3 weeks ago

stay-leave commented 1 month ago

大佬好,请教下,我用evalscope在HallusionBench数据集上测试,报告是 [{'InternVL2-26B-DPO_HallusionBench_score': {'split': 'Overall', 'aAcc': '59.89473684210527', 'fAcc': '34.39306358381503', 'qAcc': '33.62637362637363'}}] 没看懂这几个指标是啥意思。https://github.com/tianyi-lab/HallusionBench 在官方GitHub也没看到

Yunnglin commented 1 month ago

aAcc是Accuracy per Question,也即 average acc fAcc是Accuracy per Figure qAcc是Accuracy per Question Pair

参考: