issues
search
lmarena
/
arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
656
stars
74
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
generate answerrs for FT models
#53
baishalichaudhury
closed
1 week ago
2
Can provide CSV file of newest Chatbot Arena LLM Leaderboard (2024-11-04)
#52
efsotr
closed
1 week ago
0
Choosing weight and num_round setting for evaluation
#51
YJWon99
closed
1 week ago
1
About the style control leaderboard
#50
yangzy39
closed
2 weeks ago
3
The Replacement of Open-source JudgeModel/Evaluator
#49
bittersweet1999
opened
1 month ago
8
chore: update show_result.py
#48
eltociear
opened
1 month ago
0
another hard prompt
#47
maninthemiddle01
closed
2 months ago
1
Improve reproducibility in utils_math.py
#46
dustalov
opened
2 months ago
0
Can you release the model's answers and judgments for the models you ran on your benchmark?
#45
AsafYehudai
closed
2 months ago
1
Add litellm, unified dataclass description, and compatibility with vision-language models
#44
BabyChouSr
closed
1 month ago
2
Question about Llama-3.1-405b-instruct's results
#43
snova-bol
closed
2 months ago
4
Add filter step in BenchBuilder
#42
BabyChouSr
closed
2 months ago
1
Add support for vision-language conversations
#41
BabyChouSr
closed
2 months ago
2
Inquire about the process for submitting our model to be included on the leaderboard.
#40
PKU-Baichuan
opened
3 months ago
0
Conv should be defined within choice loop
#39
zankner
closed
2 months ago
1
added merge leaderboard function
#38
connorchenn
closed
3 months ago
1
added gpt 4o mini to leaderboard
#37
connorchenn
closed
4 months ago
0
updated leaderboard
#36
connorchenn
closed
4 months ago
1
remove leaderboard from root directory
#35
connorchenn
closed
4 months ago
1
Fix typo in README
#34
PaperPlaneDeemo
closed
4 months ago
1
new README
#33
connorchenn
closed
4 months ago
5
edit README
#32
connorchenn
closed
4 months ago
1
add export csv option in show_result.py
#31
connorchenn
closed
4 months ago
2
Add TGI to readme
#30
karthik-nexusflow
closed
5 months ago
1
Can you add deepseek-coder-v2?
#29
Kreijstal
closed
5 months ago
1
Fix corner-case in token length calculation when the model generates tiktoken special tokens like `<|endoftext|>`
#28
sxjscience
closed
5 months ago
2
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
#27
xiamengzhou
closed
4 months ago
6
Is there any plan to share the full dataset (200k prompts) with the "number of hardness criteria met" label ? I think it would be quite useful to the community
#26
alexchapeaux
closed
5 months ago
1
How to add new models to the leaderboard?
#25
chujiezheng
opened
5 months ago
2
Fix `winner` in `show_result.py` for game 2
#23
alvarobartt
closed
5 months ago
2
Bradley-Terry model
#22
dmitrysarov
closed
6 months ago
1
configurable parameters
#21
dmitrysarov
closed
5 months ago
5
[Q] About hosting `arena-hard-v0.1/question.json` in the Hugging Face Hub
#19
alvarobartt
closed
2 months ago
4
[Bug] Temperature is always `0.0`
#18
bcui19
closed
6 months ago
1
Multi-threads generation support ?
#17
Ignoramus0817
closed
6 months ago
1
Discrepancy in Scores When Switching GPT Model Versions
#16
wlhgtc
closed
6 months ago
10
[Discussion] Methodology for bootstrapping with replacement to obtain separability confidence intervals
#15
justinxzhao
closed
6 months ago
2
Bug in get_battles_from_judgment
#14
tangbinh
closed
6 months ago
1
[Feature] support arena-hard in opencompass
#13
bittersweet1999
closed
3 months ago
2
docs: add `git-lfs` note in `README.md`
#12
xukai92
closed
7 months ago
0
add missing deps for `show_result.py`
#11
xukai92
closed
7 months ago
1
Models testing themselves will always be biased.
#10
HideLord
closed
7 months ago
1
Allow to set generation sampling parameters
#9
psinger
closed
5 months ago
11
CI results different for same model answer copy
#8
qingquansong
closed
7 months ago
2
Evaluate local models
#7
xiamengzhou
closed
7 months ago
2
Only support baseline=True and pairwise=True?
#6
GradientGuru
closed
7 months ago
1
Majority of questions are coding questions!
#5
nxphi47
closed
7 months ago
2
Markdown Rendering Issue
#4
suquark
closed
7 months ago
1
Fix the order of questions.jsonl on Huggingface
#3
infwinston
closed
7 months ago
0
QA browser does not work properly for me
#2
suquark
closed
7 months ago
3
Next