lm-sys arena-hard-auto issues

lm-sys / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.

Apache License 2.0

312 stars 29 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

edit README

#32 connorchenn opened 5 hours ago
1
add export csv option in show_result.py

#31 connorchenn closed 8 hours ago
2
Add TGI to readme

#30 karthik-nexusflow closed 1 week ago
1
Can you add deepseek-coder-v2?

#29 Kreijstal closed 1 week ago
1
Fix corner-case in token length calculation when the model generates tiktoken special tokens like `<|endoftext|>`

#28 sxjscience closed 2 weeks ago
2
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

#27 xiamengzhou opened 3 weeks ago
6
Is there any plan to share the full dataset (200k prompts) with the "number of hardness criteria met" label ? I think it would be quite useful to the community

#26 alexchapeaux closed 3 weeks ago
1
How to add new models to the leaderboard?

#25 chujiezheng opened 1 month ago
2
Update gen_judgment.py

#24 r4dm closed 1 month ago
1
Fix `winner` in `show_result.py` for game 2

#23 alvarobartt closed 1 month ago
2
Bradley-Terry model

#22 dmitrysarov closed 1 month ago
1
configurable parameters

#21 dmitrysarov closed 2 weeks ago
5
Local model as a judge

#20 r4dm closed 1 month ago
5
[Q] About hosting `arena-hard-v0.1/question.json` in the Hugging Face Hub

#19 alvarobartt closed 1 month ago
2
[Bug] Temperature is always `0.0`

#18 bcui19 closed 1 month ago
1
Multi-threads generation support ?

#17 Ignoramus0817 closed 1 month ago
1
Discrepancy in Scores When Switching GPT Model Versions

#16 wlhgtc closed 1 month ago
6
[Discussion] Methodology for bootstrapping with replacement to obtain separability confidence intervals

#15 justinxzhao closed 1 month ago
2
Bug in get_battles_from_judgment

#14 tangbinh closed 2 months ago
1
[Feature] support arena-hard in opencompass

#13 bittersweet1999 opened 2 months ago
2
docs: add `git-lfs` note in `README.md`

#12 xukai92 closed 2 months ago
0
add missing deps for `show_result.py`

#11 xukai92 closed 2 months ago
1
Models testing themselves will always be biased.

#10 HideLord closed 2 months ago
1
Allow to set generation sampling parameters

#9 psinger closed 2 weeks ago
11
CI results different for same model answer copy

#8 qingquansong closed 2 months ago
2
Evaluate local models

#7 xiamengzhou closed 2 months ago
2
Only support baseline=True and pairwise=True?

#6 GradientGuru closed 2 months ago
1
Majority of questions are coding questions!

#5 nxphi47 closed 2 months ago
2
Markdown Rendering Issue

#4 suquark closed 2 months ago
1
Fix the order of questions.jsonl on Huggingface

#3 infwinston closed 2 months ago
0
QA browser does not work properly for me

#2 suquark closed 2 months ago
3
reproduce+improve instructions

#1 DachengLi1 closed 6 months ago
1