mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
MIT License
252 stars 10 forks source link

Typos in readme #4

Closed gaotianyu1350 closed 6 days ago

gaotianyu1350 commented 6 days ago

Great work! There is a typo in the README

enable_duo_attention_eval(
    model,
    attn_heads,
    num_recent_tokens=64,
    num_sink_tokens=256,
)

num_recent_tokens -> sink_size, num_sink_tokens -> recent_size

Guangxuan-Xiao commented 6 days ago

Fixed. Thank you!