Closed liuzhuang1024 closed 1 year ago
unless if you want to give it a shot with a PR
unless if you want to give it a shot with a PR
Maybe, when I have free time.
k no prob, should be able to get this done by this week's end, and play around with speculative decoding too
@liuzhuang1024 hey, started playing around with spec decoding, and decided to circle back to this issue https://github.com/lucidrains/x-transformers/commit/87a0f13d7730869d1e3b4af384f6813e08fd0021 let me know if it works ok
also, if anyone knows any paper with interesting (and unimplemented) ideas for speeding up causal transformer sampling, do share it now while my attention is on this
Some paper propose to prune kv cache to speed up long sequence generation, based on attention scores received in the history.
@LouChao98 ah nice, but i don't know if i believe in that route
i think this should be completed
i'll get around to some savings with absolute positional embedding at a later date
oh yup, can add this
was going to play around with speculative and contrastive decoding soon too