Closed Neph0s closed 1 year ago
yea, that does seem like a flaw with the paper. you could write a follow up paper addressing this
if i had to make a wild stab at this problem, what i'd do is, while it is decoding, align the sequence once you see that it is properly copying, and then downweight the attention logits of the passage that is in the future of the current decoding step. but that could easily be a 1-2 month research project
edit: or maybe with a big enough model, you can just ask it nicely not to do that :laughing: :man_shrugging:
good luck
Consider the input: When I wanted to implement Toolformer, I found this problem:
Considering this input: In one hour, there are 3 sets of 20 minutes. So, Joy can read 8 x 3 = 24 pages in an hour. It will take her 120/24 = 5 hours to read 120 pages.
With the API generation steps, it eventually becomes: In one hour, there are 3 sets of 20 minutes. So, Joy can read 8 x 3 = 24 pages in an hour. It will take her *[CALCULATOR(24 5) -> 120.00]** 120/24 = 5 hours to read 120 pages.
This result is not expected, since the API calls CALCULATOR(24 * 5) includes the parameter number 5, which is mentioned actually after this API call. I suppose this API is misplaced. However, it cannot be filtered with the filtering step, since this API call includes 24, 5 and 120, which originally appears in the back and hence does decrease the perplexity.
I want to know how to solve this problem?