lucidrains / toolformer-pytorch

Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI
MIT License
1.94k stars 124 forks source link

Misplaced API Calls , which can pass the filtering step. #5

Closed Neph0s closed 1 year ago

Neph0s commented 1 year ago

Consider the input: When I wanted to implement Toolformer, I found this problem:

Considering this input: In one hour, there are 3 sets of 20 minutes. So, Joy can read 8 x 3 = 24 pages in an hour. It will take her 120/24 = 5 hours to read 120 pages.

With the API generation steps, it eventually becomes: In one hour, there are 3 sets of 20 minutes. So, Joy can read 8 x 3 = 24 pages in an hour. It will take her *[CALCULATOR(24 5) -> 120.00]** 120/24 = 5 hours to read 120 pages.

This result is not expected, since the API calls CALCULATOR(24 * 5) includes the parameter number 5, which is mentioned actually after this API call. I suppose this API is misplaced. However, it cannot be filtered with the filtering step, since this API call includes 24, 5 and 120, which originally appears in the back and hence does decrease the perplexity.

I want to know how to solve this problem?

lucidrains commented 1 year ago

yea, that does seem like a flaw with the paper. you could write a follow up paper addressing this

lucidrains commented 1 year ago

if i had to make a wild stab at this problem, what i'd do is, while it is decoding, align the sequence once you see that it is properly copying, and then downweight the attention logits of the passage that is in the future of the current decoding step. but that could easily be a 1-2 month research project

edit: or maybe with a big enough model, you can just ask it nicely not to do that :laughing: :man_shrugging:

good luck