hsiehjackson RULER issues

hsiehjackson / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Apache License 2.0

319 stars 17 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

128K sequence length means 131072 or 128000

#34 syp1997 opened 3 days ago
1
Qwen2 and DeepSeek-V2 results?

#33 hijkzzz opened 6 days ago
1
Add SGLang backend

#32 Ying1123 closed 2 days ago
0
Base vs Chat prompt question.

#31 karansaxena closed 2 weeks ago
3
Prediction format during evals

#30 karansaxena closed 3 weeks ago
5
pre_sample in qa code

#29 vkaul11 closed 2 weeks ago
1
request for evaluating GLM4-9B-chat(-1M)

#28 yucc-leon closed 2 weeks ago
2
questions about ICL code for variable tracking

#27 vkaul11 opened 3 weeks ago
1
Is there any issue in extending context length to 1 million using your script

#26 vkaul11 opened 3 weeks ago
1
What is the need for is_icl parameter?

#25 vkaul11 opened 3 weeks ago
1
lost in the middle problem

#24 vkaul11 opened 3 weeks ago
1
how do you take care of the presence of 'and' in the output in the evaluation

#23 vkaul11 opened 4 weeks ago
1
prediction evaluation statistics

#22 vkaul11 opened 4 weeks ago
4
Why do you need to separate the last batch of the output

#21 vkaul11 opened 1 month ago
1
Add answer_predfix to prevent model from refusing to answer typo?

#20 vkaul11 opened 1 month ago
2
what was the reason to use nltk in NIAK task here

#19 vkaul11 closed 1 month ago
3
dataset argument for qa.py not specified

#18 vkaul11 closed 1 month ago
2
Yuzhe

#17 zyzzzz-123 closed 1 month ago
0
Question about files nouns.list and verbs.list

#16 vkaul11 closed 1 month ago
0
Why do you use partial match max metric for QA

#15 vkaul11 closed 1 month ago
1
How to test models with larger context length than 128K ?

#14 yaswanth-iitkgp opened 1 month ago
10
Tempate for Yi?

#13 liyucheng09 closed 1 month ago
2
gpt-4o results?

#12 the21st opened 1 month ago
1
No Generated Output and JSON Serialization Error when calling llm directly in VLLMClient

#11 yaswanth-iitkgp opened 1 month ago
2
Raw scores?

#10 WesleyYue opened 1 month ago
2
Do we have any ideia how many tokens is used to run the full benchmark in a model?

#9 daniellefranca96 closed 1 month ago
1
Why is multi_key_2 and 3 with only 1 key?

#8 jzhang38 closed 1 month ago
1
Time taken on 8 A100?

#7 jzhang38 closed 1 month ago
2
Llama 3 rope theta

#6 ganler closed 1 month ago
4
niah.py hang with hf models

#5 hijkzzz closed 2 months ago
4
How to evaluate the performance of RWKV or Jamba?

#4 hijkzzz closed 2 months ago
0
Score is always 0.0, and it takes so long to prepare the dataset

#3 YJHMITWEB closed 2 months ago
1
Show Gemini Pro results

#2 s-macke closed 2 months ago
2
When will the codes be release

#1 Mooler0410 closed 2 months ago
3