logikon-ai / cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
12 stars 2 forks source link

Evaluate: Qwen/Qwen1.5-XX #26

Closed ggbetz closed 1 month ago

ggbetz commented 8 months ago

For {XX} in [0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B]:

Check:

Parameters:

NEXT_MODEL_PATH=Qwen/Qwen1.5-{XX}
NEXT_MODEL_REVISION=main
NEXT_MODEL_PRECISION=bfloat16
MAX_LENGTH=2048 
GPU_MEMORY_UTILIZATION=0.7
VLLM_SWAP_SPACE=8
ggbetz commented 7 months ago

Qwen models fail to generate reasoning traces. https://github.com/logikon-ai/cot-eval/blob/f9bfe8f757edbed49324df680214a24fbde37213/src/cot_eval/__main__.py#L139C1-L146C53

ggbetz commented 7 months ago

Might however be related to https://github.com/logikon-ai/cot-eval/issues/48, as I've been testing the smallest base model only...

ggbetz commented 2 months ago

Let's skip 1.5 and directly go for Qwen2...