princeton-nlp HELMET issues - Githubissues

princeton-nlp / HELMET

The HELMET Benchmark

https://arxiv.org/abs/2410.02694

MIT License

75 stars 9 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

How to evaluate the base models?

#10 canghongjian closed 1 week ago
2
Evaluation with larger batch size

#9 prakamya-mishra closed 2 weeks ago
1
Reproducing results on Llama-3.1-8B-Inst

#8 chtmp223 opened 1 month ago
3
ALCE citation evaluation

#7 carriex opened 1 month ago
3
Discrepancy in gpt4o-mini Results on MSMarco Compared to Reported Results

#6 8188zq opened 1 month ago
9
fix: add trust_remote_code for banking77

#5 Wangmerlyn closed 1 month ago
1
Is the code of eval_gpt4_longqa.sh is correct?

#4 enze5088 closed 1 month ago
1
Making the full eval sheet a read-only excel

#3 LeoXinhaoLee closed 1 month ago
1
Download data link

#2 maxjeblick closed 1 month ago
1
Missing requirements

#1 maxjeblick closed 1 month ago
1