Closed zhimin-z closed 6 months ago
Thanks for the attention, it was previously because the domain was being filed and is now accessible.
In addition, OpenEval is not a benchmark, but a evaluation platform for large Chinese models, including several Chinese benchmarks.
Hi, thanks for your questions. For Github's issue, it has not built, because OpenEval still are fixing, and we are discussing about this. Please waiting for this. For two domain benchmarks, "罪名法务智能数据集" is testing now, but we find a better law benchmark for LLMs, so we may take place of this benchmark. And "WGLaw" is tested now, we will release results for a moment. This benchmark is collected from Yellow River Conservancy Commission of the Ministry of Water Resources. But we find general results are still worse, and we will add these new results of domain benchmarks soon.
Where are the results of these two benchmarks for domain evaluation in the OpenEval leaderboard?![image](https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers/assets/8592144/ee47522d-83c5-4f90-9626-535300e91adc)