tomaarsen / attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
https://huggingface.co/blog/tomaarsen/attention-sinks
Apache License 2.0
650 stars 41 forks source link

3.3: Learnable Sink Token #38

Open photomz opened 7 months ago

photomz commented 7 months ago

Kudos to authors for open-sourcing a practical LLM chat improvement so quickly. In the preprint's Section 3.3, you experiment with:

prepending a learnable placeholder token (Sink Token) in all training samples.

Specifically, how do you add a "sink token"? Is it functionally different from GPT's \<startoftext> or Llama2's [BOS] tokens? Is the logic unique? Releasing a code snippet of the Sink Token training would be great, thanks.

tomaarsen commented 7 months ago

Hello!

This is not the official repo for the paper, but rather work by an inspired fan :) If you would like the paper authors to see this, consider opening an issue in https://github.com/mit-han-lab/streaming-llm

I'm personally not quite sure which tokens in particular they used as placeholder tokens, but it is very possible that it's just regular tokens that they "consider" as sink tokens by not letting the window cache discard them.