logikon-ai / cot-eval

A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
https://huggingface.co/spaces/logikon/open_cot_leaderboard
MIT License
12 stars 2 forks source link

harness: --log_samples #57

Open ggbetz opened 4 months ago

ggbetz commented 4 months ago

Use --log_samples when calling harness and upload them in separate repo for later diagnostics:

See: https://github.com/EleutherAI/lm-evaluation-harness/issues/1842