issues
search
lm-sys
/
arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
312
stars
29
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
edit README
#32
connorchenn
opened
5 hours ago
1
add export csv option in show_result.py
#31
connorchenn
closed
8 hours ago
2
Add TGI to readme
#30
karthik-nexusflow
closed
1 week ago
1
Can you add deepseek-coder-v2?
#29
Kreijstal
closed
1 week ago
1
Fix corner-case in token length calculation when the model generates tiktoken special tokens like `<|endoftext|>`
#28
sxjscience
closed
2 weeks ago
2
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
#27
xiamengzhou
opened
3 weeks ago
6
Is there any plan to share the full dataset (200k prompts) with the "number of hardness criteria met" label ? I think it would be quite useful to the community
#26
alexchapeaux
closed
3 weeks ago
1
How to add new models to the leaderboard?
#25
chujiezheng
opened
1 month ago
2
Update gen_judgment.py
#24
r4dm
closed
1 month ago
1
Fix `winner` in `show_result.py` for game 2
#23
alvarobartt
closed
1 month ago
2
Bradley-Terry model
#22
dmitrysarov
closed
1 month ago
1
configurable parameters
#21
dmitrysarov
closed
2 weeks ago
5
Local model as a judge
#20
r4dm
closed
1 month ago
5
[Q] About hosting `arena-hard-v0.1/question.json` in the Hugging Face Hub
#19
alvarobartt
closed
1 month ago
2
[Bug] Temperature is always `0.0`
#18
bcui19
closed
1 month ago
1
Multi-threads generation support ?
#17
Ignoramus0817
closed
1 month ago
1
Discrepancy in Scores When Switching GPT Model Versions
#16
wlhgtc
closed
1 month ago
6
[Discussion] Methodology for bootstrapping with replacement to obtain separability confidence intervals
#15
justinxzhao
closed
1 month ago
2
Bug in get_battles_from_judgment
#14
tangbinh
closed
2 months ago
1
[Feature] support arena-hard in opencompass
#13
bittersweet1999
opened
2 months ago
2
docs: add `git-lfs` note in `README.md`
#12
xukai92
closed
2 months ago
0
add missing deps for `show_result.py`
#11
xukai92
closed
2 months ago
1
Models testing themselves will always be biased.
#10
HideLord
closed
2 months ago
1
Allow to set generation sampling parameters
#9
psinger
closed
2 weeks ago
11
CI results different for same model answer copy
#8
qingquansong
closed
2 months ago
2
Evaluate local models
#7
xiamengzhou
closed
2 months ago
2
Only support baseline=True and pairwise=True?
#6
GradientGuru
closed
2 months ago
1
Majority of questions are coding questions!
#5
nxphi47
closed
2 months ago
2
Markdown Rendering Issue
#4
suquark
closed
2 months ago
1
Fix the order of questions.jsonl on Huggingface
#3
infwinston
closed
2 months ago
0
QA browser does not work properly for me
#2
suquark
closed
2 months ago
3
reproduce+improve instructions
#1
DachengLi1
closed
6 months ago
1