issues
search
mit-han-lab
/
duo-attention
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
MIT License
250
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Curious about the threshold tau selected to conduct experiment
#5
BirdChristopher
closed
15 hours ago
1
Typos in readme
#4
gaotianyu1350
closed
6 days ago
1
Question about Sink+Sliding Window
#3
namespace-Pt
closed
6 days ago
1
How long does calibration take to find Retrieval heads / attention patterns?
#2
nicolefinnie
closed
6 days ago
5
chore: update llama.py
#1
eltociear
closed
5 days ago
0