Open BinWang28 opened 1 year ago
Thank you for bringing this paper to our attention. The focus on multilingual evaluation and cultural understanding is indeed an important aspect of the evaluation of multilingual LLMs. The paper provides a comprehensive evaluation of the multilingual capabilities of multilingual LLMs, examining several dimensions including reasoning, language, and cultural comprehension.
However, our current taxonomy for LLM evaluation, while extensive, is not yet exhaustive. As such, we have not been able to identify an appropriate category within our existing framework to accommodate this insightful paper. We are committed to continually refining our survey and taxonomy for LLM evaluation, and we plan to update our paperlist to include this paper.
Thanks a lot. Yeah, multilingual evaluation can be one major part that has received less attention than English-only ones. It would be great to have some discussions over this to make it more comprehensive. Paper, Leaderboard Website, Data, Code
Besides, another paper FacEval on Truthfulness can also be included. Its related works are FactCC, TruthfulQA, FRANK, FEQA, and FaithDial, which are included in the same branch. I will pull a request for that.
We greatly appreciate your attention and ongoing contributions to our repository. The pull request you submitted has been successfully merged, and the associated paper will be incorporated into the forthcoming version of our survey.
As we strive for continuous improvement, we are in the process of refining our survey and the corresponding paperlist to enhance the comprehensiveness of our work, with a particular emphasis on multilingual evaluation.
Thank you once again for your valuable contributions.
Hi, @BinWang28 thanks for your issue. I attempted to click on Leaderboard Website, but there is no leaderboard. I wonder how I can access the leaderboard information.
Yes. Please refer to the new website. https://seaeval.github.io/
The results on each dataset are shown in the respective category: e.g. https://seaeval.github.io/cross_lingual_consistency.html
thanks, this one is hard to find, maybe it is better to have an explicit name called "leaderboard" so people can find it easily.
Thanks for the suggestion. Will consider how to make it visually better.
Please note our paper on evaluation, which could be an important building block for multilingual evaluation and cultural understanding.
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning