tjunlp-lab / Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
688 stars 42 forks source link

Add MINT-Bench #6

Closed xingyaoww closed 11 months ago

xingyaoww commented 11 months ago

Hi there,

Thanks for the effort in putting up this repo!

We started this PR to add our work on LLM agent evaluation in multi-turn interaction. Website: https://xingyaoww.github.io/mint-bench/.

cordercorder commented 11 months ago

Thanks for your attention and invaluable contribution to our repository.