iEvals is a framework for evaluating chinese large language models (LLMs), especially performance in traditional chinese domain. Our goal was to provide an easy to setup and fast evaluation library for guiding the performance/use on existing chinese LLMs.
Currently, we only support evaluation for TMMLU+, however in the future we are exploring more domain, ie knowledge extensive dataset (CMMLU, C-Eval) as well as context retrieval and multi-conversation dataset.
pip install git+https://github.com/ikala-corp/ievals.git
ieval <model name> <series: optional> --top_k <numbers of incontext examples>
For more details please refer to models section
Chain of Thought (CoT) with few shot
Arxiv paper : detailed analysis on model interior and exterior relations
More tasks
@article{ikala2023eval,
title={An Improved Traditional Chinese Evaluation Suite for Foundation Model},
author={Tam, Zhi-Rui and Pai, Ya-Ting},
journal={arXiv},
year={2023}
}
This is not an officially supported iKala product.
This research code is provided "as-is" to the broader research community. iKala does not promise to maintain or otherwise support this code in any way.