logikon-ai / cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
5 stars 1 forks source link

Why are many CoT reasoning traces empty? #48

Open ggbetz opened 3 months ago

ggbetz commented 3 months ago

In the DatasetViewer of cot-leaderboard/cot-eval-traces-2.0, it appears that many reasoning traces. Is this a bug?

ggbetz commented 3 months ago

We've now had a closer a look. It appears that mainly base models, which have not been instruction-tuned, fail to follow the instructions to reason step by step, i.e. fail to generate reasoning traces at all. – Which makes sense.