Closed RonanKMcGovern closed 9 months ago
I'm not one of the authors, but I can answer.
As for your idea, it's certainly viable to use either [MASK] tokens or 0-embeddings. Even if it improves upon the performance, I don't think that the improvement will be huge, but you're welcome to try it out.
Thanks very much. As an aside, apparently TGI tried look ahead but found little speed up for the added compute.
You mean Huggingface's transformers
, right? Not TGI. But yeah, both Joao Gante and Louis-y-nlp
in #19 noticed that you don't get much of a speedup if you don't have the FLOPS to spare.
Yeah makes sense. The comment I was referencing is this one: https://github.com/huggingface/text-generation-inference/issues/1169#issuecomment-1866069892
Thanks
I'm assuming that when Olivier Dehaene mentioned that it was tested internally, that he was referring to Joao Gante's test (Gante works at Huggingface). See https://github.com/huggingface/transformers/issues/27649#issuecomment-1824621466 for details.
Wow yeah that's a great post from Joao, thanks for sharing that. I didn't appreciate FA2 compatibility was a consideration too.
Incidentally, it seems like the original Jacobi decoding paper uses [PAD] tokens instead of random vocabulary tokens.
Thanks for putting this blog together.
Regarding simple jacobi decoding:
Regarding look ahead
Jacobi Jacobi is mentioned a lot in the blog, but I don't really see that as central... Basically we're just randomly guessing a token that is W tokens away, using previous forward passes to improve the quality of guesses within the window, and then using that as an ngram database?
To further improve the quality of guesses, would it be an idea to just mask the input effect completely - rather than guessing the tokens. My sense is that - because attention is so strong for nearby tokens - that guessing the tokens is worse than actually passing through blank information for that guess position. That would allow the decoded output to be based purely on the info from tokens we do know 100%.