length extrapolation - Githubissues

zhaoxlpku / HKU-DASC7606-A2

18 stars 11 forks source link

length extrapolation #4

Open tengwang0318 opened 4 months ago

tengwang0318 commented 4 months ago

Hi, I notice you mentioned that we can try linear interpolation or something else like NTK, to address the length limitation. I got quite confused cause the question that needs to answer, is at the last position, how to make different chunks "communciate" together. I know sliding window can work here, but don't think it could make sense for this task. Do you have any suggestions?

zhaoxlpku commented 3 months ago

Great question! The goal of increasing the maximum sequence length is to allow for more in-context examples. This could potentially enhance the model's performance (though it's not confirmed). We encourage you to experiment with it and discover the impact yourself.

tengwang0318 commented 3 months ago

Thanks a lot. After experimenting length extrapolation in my way, I think it increases the performance by 2% compared with my current best model. After adding this part, my report maybe cannot fill in it within 4 pages, could I write a little bit more in the report file, like 5 pages?

zhaoxlpku commented 3 months ago

Sure, you can add more pages, just like the extra pages allowed in the appendix of many conferences (e.g., NeurIPS, ICLR...)