issues
search
logikon-ai
/
cot-eval
A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
12
stars
2
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
min_p decoding
#70
ggbetz
opened
1 month ago
0
mistralai/Mixtral-8x22B-Instruct-v0.1 / tokenizer
#69
ggbetz
opened
1 month ago
0
config_mistralai+Mixtral-8x22B-Instruct-v0.1.env / tokenizer
#68
ggbetz
closed
1 month ago
0
mistralai/Mistral-Large-Instruct-2407 doesn't load
#67
ggbetz
opened
1 month ago
0
mistralai+Mistral-7B-Instruct-v0.3 tokenizer / lm-eval
#66
ggbetz
opened
1 month ago
0
01-ai/Yi-1.5-6B-Chat bug
#65
ggbetz
opened
1 month ago
0
internlm/internlm2_5-1_8b-chat lm-eval bug
#64
ggbetz
opened
1 month ago
0
mistralai/Mixtral-8x22B-Instruct-v0.1 fails
#63
ggbetz
closed
1 month ago
1
mistralai/Mistral-Nemo-Instruct-2407 vllm-server crashes
#62
ggbetz
opened
1 month ago
1
chat-template problem with allenai/OLMo-1B-0724-hf
#61
ggbetz
opened
1 month ago
0
Workflow orchestration framework
#60
ggbetz
opened
1 month ago
1
Beam search
#59
ggbetz
closed
1 month ago
1
Reflection prompt
#58
ggbetz
closed
1 month ago
1
harness: --log_samples
#57
ggbetz
opened
4 months ago
0
Evaluate: Gemma 2
#56
ggbetz
closed
1 month ago
4
Evaluate: CohereForAI/aya-23-XXB
#55
ggbetz
closed
1 month ago
1
Transformers version
#54
ggbetz
closed
1 month ago
1
Evaluate: 01-ai/Yi-1.5-34B-Chat
#53
ggbetz
closed
5 months ago
1
Evaluate: tiiuae/falcon-11B
#52
ggbetz
closed
1 month ago
1
Evaluate: microsoft/Phi-3
#51
ggbetz
closed
1 month ago
3
Evaluate: nvidia/nemotron-3-8b-XXX
#50
ggbetz
closed
6 months ago
2
Evaluate: meta-llama/Meta-Llama-3-XXX
#49
ggbetz
closed
6 months ago
0
Why are many CoT reasoning traces empty?
#48
ggbetz
closed
1 month ago
1
Evaluate: jetmoe/jetmoe-8b | -8b-sft | -8b-chat
#47
ggbetz
closed
1 month ago
2
new cot-eval-traces ds structure
#46
ggbetz
closed
7 months ago
0
Evaluate: Qwen/Qwen1.5-MoE-XX
#45
ggbetz
closed
1 month ago
1
Evaluate: CohereForAI/c4ai-command-r-plus
#44
ggbetz
closed
1 month ago
5
Evaluate: core42/jais-XX
#43
ggbetz
closed
1 month ago
2
Evaluate: internlm/internlm2-math-XX
#42
ggbetz
closed
6 months ago
0
Evaluate: internlm/internlm2-XX
#41
ggbetz
closed
6 months ago
0
Evaluate: CohereForAI/c4ai-command-r-v01
#40
ggbetz
closed
1 month ago
0
Evaluate: allenai/OLMo-1B
#39
ggbetz
closed
1 month ago
1
Evaluate: openbmb/Eurus-7b-kto
#38
ggbetz
closed
7 months ago
0
Evaluate: openbmb/Eurus-70b-sft
#37
ggbetz
closed
7 months ago
2
Evaluate: databricks/dbrx-instruct
#36
ggbetz
closed
5 months ago
1
Evaluate: databricks/dbrx-base
#35
ggbetz
closed
1 month ago
0
Evaluate: ai21labs/Jamba-v0.1
#34
ggbetz
closed
1 month ago
0
Evaluate: 01-ai/Yi-34B-Chat
#33
ggbetz
closed
6 months ago
1
Evaluate: allenai/tulu-2-dpo-7b
#32
ggbetz
closed
8 months ago
0
Evaluate: Qwen/Qwen-72B-Chat
#31
ggbetz
closed
7 months ago
3
Evaluate: openchat/openchat-3.5-0106-gemma
#30
ggbetz
closed
1 month ago
2
Evaluate: upstage/SOLAR-10.7B-Instruct-v1.0
#29
ggbetz
closed
7 months ago
0
Evaluate: upstage/SOLAR-10.7B-v1.0
#28
ggbetz
closed
7 months ago
0
Evaluate: Qwen/Qwen1.5-XX-Chat
#27
ggbetz
closed
1 month ago
1
Evaluate: Qwen/Qwen1.5-XX
#26
ggbetz
closed
1 month ago
3
Evaluate: 01-ai/Yi-34B
#25
ggbetz
closed
7 months ago
0
Evaluate: google/gemma-7b-it
#24
ggbetz
closed
7 months ago
0
Evaluate: google/gemma-7b
#23
ggbetz
closed
6 months ago
6
Evaluate: NousResearch/Nous-Hermes-Llama2-13b
#22
ggbetz
closed
7 months ago
3
Evaluate: allenai/tulu-2-dpo-13b
#21
ggbetz
closed
8 months ago
1
Next