issues
search
logikon-ai
/
cot-eval
A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
5
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Evaluate: Gemma 2
#56
ggbetz
opened
1 week ago
0
Evaluate: CohereForAI/aya-23-XXB
#55
ggbetz
opened
1 month ago
1
Transformers version
#54
ggbetz
opened
1 month ago
0
Evaluate: 01-ai/Yi-1.5-34B-Chat
#53
ggbetz
closed
3 weeks ago
1
Evaluate: tiiuae/falcon-11B
#52
ggbetz
opened
1 month ago
1
Evaluate: microsoft/Phi-3
#51
ggbetz
opened
1 month ago
1
Evaluate: nvidia/nemotron-3-8b-XXX
#50
ggbetz
closed
1 month ago
2
Evaluate: meta-llama/Meta-Llama-3-XXX
#49
ggbetz
closed
1 month ago
0
Why are many CoT reasoning traces empty?
#48
ggbetz
opened
2 months ago
1
Evaluate: jetmoe/jetmoe-8b | -8b-sft | -8b-chat
#47
ggbetz
opened
2 months ago
2
new cot-eval-traces ds structure
#46
ggbetz
closed
2 months ago
0
Evaluate: Qwen/Qwen1.5-MoE-XX
#45
ggbetz
opened
2 months ago
1
Evaluate: CohereForAI/c4ai-command-r-plus
#44
ggbetz
opened
2 months ago
5
Evaluate: core42/jais-XX
#43
ggbetz
opened
2 months ago
2
Evaluate: internlm/internlm2-math-XX
#42
ggbetz
closed
1 month ago
0
Evaluate: internlm/internlm2-XX
#41
ggbetz
closed
1 month ago
0
Evaluate: CohereForAI/c4ai-command-r-v01
#40
ggbetz
opened
2 months ago
0
Evaluate: allenai/OLMo-1B
#39
ggbetz
opened
2 months ago
1
Evaluate: openbmb/Eurus-7b-kto
#38
ggbetz
closed
2 months ago
0
Evaluate: openbmb/Eurus-70b-sft
#37
ggbetz
closed
2 months ago
2
Evaluate: databricks/dbrx-instruct
#36
ggbetz
closed
3 weeks ago
1
Evaluate: databricks/dbrx-base
#35
ggbetz
opened
3 months ago
0
Evaluate: ai21labs/Jamba-v0.1
#34
ggbetz
opened
3 months ago
0
Evaluate: 01-ai/Yi-34B-Chat
#33
ggbetz
closed
1 month ago
1
Evaluate: allenai/tulu-2-dpo-7b
#32
ggbetz
closed
3 months ago
0
Evaluate: Qwen/Qwen-72B-Chat
#31
ggbetz
closed
2 months ago
3
Evaluate: openchat/openchat-3.5-0106-gemma
#30
ggbetz
opened
3 months ago
2
Evaluate: upstage/SOLAR-10.7B-Instruct-v1.0
#29
ggbetz
closed
2 months ago
0
Evaluate: upstage/SOLAR-10.7B-v1.0
#28
ggbetz
closed
2 months ago
0
Evaluate: Qwen/Qwen1.5-XX-Chat
#27
ggbetz
opened
3 months ago
1
Evaluate: Qwen/Qwen1.5-XX
#26
ggbetz
opened
3 months ago
2
Evaluate: 01-ai/Yi-34B
#25
ggbetz
closed
2 months ago
0
Evaluate: google/gemma-7b-it
#24
ggbetz
closed
2 months ago
0
Evaluate: google/gemma-7b
#23
ggbetz
closed
1 month ago
6
Evaluate: NousResearch/Nous-Hermes-Llama2-13b
#22
ggbetz
closed
3 months ago
3
Evaluate: allenai/tulu-2-dpo-13b
#21
ggbetz
closed
3 months ago
1
Evaluate: allenai/tulu-2-13b
#20
ggbetz
closed
3 months ago
1
Evaluate: meta-llama/Llama-2-13b-chat-hf
#19
ggbetz
closed
3 months ago
1
Evaluate: meta-llama/Llama-2-13b-hf
#18
ggbetz
closed
3 months ago
1
Evaluate: NousResearch/Nous-Hermes-Llama2-70b
#17
ggbetz
closed
3 months ago
2
Evaluate: meta-llama/Llama-2-70b-hf
#16
ggbetz
closed
3 months ago
1
Evaluate: meta-llama/Llama-2-70b-chat-hf
#15
ggbetz
closed
3 months ago
1
Evaluate: allenai/tulu-2-dpo-70b
#14
ggbetz
closed
3 months ago
1
Evaluate: mistralai/Mixtral-8x7B-Instruct-v0.1
#13
ggbetz
closed
2 months ago
0
Evaluate: mistralai/Mixtral-8x7B-v0.1
#12
ggbetz
closed
2 months ago
0
Evaluate: allenai/tulu-2-70b
#11
ggbetz
closed
3 months ago
1
not enough swap space issue
#10
ggbetz
closed
3 months ago
3
wandb integration
#9
ggbetz
opened
3 months ago
0
running evals in parallel
#8
ggbetz
closed
3 months ago
1
check validity of token early on, not after traces have been generated
#7
ggbetz
opened
3 months ago
0
Next