xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding
Apache License 2.0
3.09k stars 312 forks source link

Add GSM-Symbolic evals #70

Open e271828- opened 1 month ago

e271828- commented 1 month ago

Looks well-suited for entropix to show improvements with smaller models, in particular on GSM-NoOp: https://arxiv.org/abs/2410.05229

Screenshot 2024-10-11 at 6 22 07 AM Screenshot 2024-10-11 at 6 22 59 AM
e271828- commented 1 month ago

Fails on my local MLX branch:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?<|eot_id|><|start_header_id|>assistant<|end_header_id|><|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Let's dive into Oliver's thought process.

Oliver starts with 44 kiwis on Friday. He then picks 58 kiwis on Saturday, which is a straightforward addition. So, he now has 44 + 58 = 102 kiwis.

On Sunday, he decides to double the number of kiwis he picked on Saturday, but then he misses 5 kiwis. So, he has 58 kiwis (the number on Saturday) - 5 (missing kiwis) = 53 kiwis.