hendrycks / apps

APPS: Automated Programming Progress Standard (NeurIPS 2021)
MIT License
414 stars 55 forks source link

Too Long Problems #22

Open MT010104 opened 2 years ago

MT010104 commented 2 years ago

1668943017588 There are some long problems in APPS, so I truncated them after encoding. But the output of the model is in the form of "problem + answer", so output is definitely longer than input. Max_length(1024-input_ids) is set for the output. Actually, if output's length needs to meet the requirement, input's length is much less than 1024. Otherwise we won't get a complete answer even if not reporting an error. Is it right? Also, why is max_length of output is set to "1024-inputs"?

xksteven commented 2 years ago

Where in the code is this? It's hard to search from pictures.

MT010104 commented 1 year ago

generate_gpt_codes.py #176

xksteven commented 1 year ago

We can make that into a variable that gets passed in and possibly throw a warning when the value less than some threshold like 10. Let me know if you think this would be a good compromise.

MT010104 commented 1 year ago

1669943525134 Sorry, I couldn't make sense of it. Long inputs lead to longer outputs. But if “max_length” is set larger such as 2048, the same error will occur.

xksteven commented 1 year ago

Do you have a proposal then?

Some models have a max input size that needs to be respected.

You can consider preprocessing your inputs to have less tokens.

Open to alternative suggestions.

NUAAZXY commented 1 year ago

I encountered the same problem. Finally, I set truncation=True while encoding, but will this cause other problem?

MT010104 commented 1 year ago

I used the data provided by APPS and did not change it. As far as I know, gpt2 has a max input size of 1024 and gpt-neo has a max input size of 2048. But they do not make explicit requirements for the output. If there is really no help for it, we may discard these very long questions.

xksteven commented 1 year ago

Sorry for the confusion. We used gpt-neo which had max length of 2048 so by capping it to 1024 it still had 1024 room to output a solution. For gpt I can see how this is an issue. I have to double check how we evaluated that. We had to truncate from the beginning of the questions and set the max to be smaller.