mit-han-lab / duo-attention

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
MIT License
252 stars 10 forks source link

chore: update llama.py #1

Closed eltociear closed 5 days ago

eltociear commented 1 week ago

continous -> continuous