instructlab / taxonomy

Taxonomy tree that will allow you to create models tuned with your data
Apache License 2.0
178 stars 621 forks source link

Skill Knowhow: LLMs are bad at math #167

Closed obuzek closed 6 months ago

obuzek commented 6 months ago

David brings up a great point that LLMs are well-known to be poor at math. Even if we can make them behave correctly on small number examples, it's likely to extrapolate poorly to large numbers and large texts.

Questions for the Lab inventors:

cc @akashgit @katesoule @xukai92 @juliadenham

              @harshmadhani From what I understand, LLMs are inherently bad at math, including counting. They operate on tokens and only select the next token to output based on probabilities. They do now know how to do math.

So if the model once learned that 1 + 1 = 2, it cannot suddenly do arbitrary additions. It might only do that calculation (that it learned) "better". Same with counting.

What I read so far is that LLMs can only properly deal with math once they use some external helper service. Same with counting tokens.

So my understanding is, that no "skill" here can fix that. You might only be able to teach it some instances of such math/counting problems, but not how to universally do it.

I might be completely wrong and am happy to learn why that is not the case. But there is usually a reason why this LLM is *that* bad at math+counting while it is *that* good at stuff that doesn't involve math/counting :)

_Originally posted by @davidhildenbrand in https://github.com/instruct-lab/taxonomy/issues/134#issuecomment-1983512059_
obuzek commented 6 months ago

The existence of the quantitative skill tree suggests that we do expect math-related skills.

markmc commented 6 months ago

Please take a look at #200 - it may be an example of how a skill could improve scores on math questions without actually attempting to teach the model to "do math".

obuzek commented 6 months ago

I agree, that one and #231 are fascinating examples. Added a "skill category: abstract math" label so we can consider them together.

obuzek commented 6 months ago

Based on this week's experience, I think we can consider that "math" and "computation"-type skills are generally going to be rejected. Abstract math still might be a special case, but even so it's likely to be rejected until we establish a "reasoning"-related foundational skill.

People have definitely submitted some very cool things on this topic so we'll have lots to choose from when we start opening in that direction!