tjunlp-lab / Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
638 stars 41 forks source link

SeaEval: Multilingual LLM Evaluation #7

Open BinWang28 opened 8 months ago

BinWang28 commented 8 months ago

Please note our paper on evaluation, which could be an important building block for multilingual evaluation and cultural understanding.

SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

cordercorder commented 8 months ago

Thank you for bringing this paper to our attention. The focus on multilingual evaluation and cultural understanding is indeed an important aspect of the evaluation of multilingual LLMs. The paper provides a comprehensive evaluation of the multilingual capabilities of multilingual LLMs, examining several dimensions including reasoning, language, and cultural comprehension.

However, our current taxonomy for LLM evaluation, while extensive, is not yet exhaustive. As such, we have not been able to identify an appropriate category within our existing framework to accommodate this insightful paper. We are committed to continually refining our survey and taxonomy for LLM evaluation, and we plan to update our paperlist to include this paper.

BinWang28 commented 7 months ago

Thanks a lot. Yeah, multilingual evaluation can be one major part that has received less attention than English-only ones. It would be great to have some discussions over this to make it more comprehensive. Paper, Leaderboard Website, Data, Code

Besides, another paper FacEval on Truthfulness can also be included. Its related works are FactCC, TruthfulQA, FRANK, FEQA, and FaithDial, which are included in the same branch. I will pull a request for that.

cordercorder commented 7 months ago

We greatly appreciate your attention and ongoing contributions to our repository. The pull request you submitted has been successfully merged, and the associated paper will be incorporated into the forthcoming version of our survey.

As we strive for continuous improvement, we are in the process of refining our survey and the corresponding paperlist to enhance the comprehensiveness of our work, with a particular emphasis on multilingual evaluation.

Thank you once again for your valuable contributions.

zhimin-z commented 7 months ago

Hi, @BinWang28 thanks for your issue. I attempted to click on Leaderboard Website, but there is no leaderboard. I wonder how I can access the leaderboard information.

BinWang28 commented 7 months ago

Yes. Please refer to the new website. https://seaeval.github.io/

The results on each dataset are shown in the respective category: e.g. https://seaeval.github.io/cross_lingual_consistency.html

zhimin-z commented 7 months ago

thanks, this one is hard to find, maybe it is better to have an explicit name called "leaderboard" so people can find it easily.

BinWang28 commented 7 months ago

Thanks for the suggestion. Will consider how to make it visually better.