Inference Speed for Long Articles

paperswithcode / galai

Model API for GALACTICA

Apache License 2.0

2.67k stars 275 forks source link

Inference Speed for Long Articles #68

Open saptarshi059 opened 1 year ago

saptarshi059 commented 1 year ago

Is there any way to increase the generation speed for extremely long articles, such as 5000 tokens long? I've been trying to apply several optimization tricks, but none seems to work. Or is it just the case that text-generation in general for such long spans WILL be slow and there's no way around it?

magicknight commented 1 year ago

Is there any way to increase the generation speed for extremely long articles, such as 5000 tokens long? I've been trying to apply several optimization tricks, but none seems to work. Or is it just the case that text-generation in general for such long spans WILL be slow and there's no way around it?

How to generate 5000 tokens?

saptarshi059 commented 1 year ago

So there's no way to directly generate ~5000 tokens,.. that's a limitation of any decoder-based models since they can only process tokens up to their maximum input length which in this case is 2048,.. what I was doing then is,. generate 2048 tokens,.. and then use the last N (say 150) tokens as input to generate new text (almost like a sliding window).. in this way I saw that the ultimate text was reasonably coherent..