issues
search
tomaarsen
/
attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
https://huggingface.co/blog/tomaarsen/attention-sinks
Apache License 2.0
649
stars
41
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump transformers from 4.34.0 to 4.38.0
#46
dependabot[bot]
opened
3 months ago
0
Last generated token getting ignored in streaming.py?
#45
ritik99
opened
3 months ago
0
Trying to install via Kaggle
#44
Kuchiriel
closed
5 months ago
1
TypeError: 'NoneType' object is not subscriptable
#43
Kuchiriel
opened
5 months ago
0
Support AutoGPTQ
#42
Minami-su
opened
6 months ago
0
Support newer versions of mistral (e.g. mistralai/Mistral-7B-Instruct-v0.2)?
#41
spring1915
opened
6 months ago
2
chatglm3 support?
#40
ScottishFold007
opened
6 months ago
0
Bump transformers from 4.34.0 to 4.36.0
#39
dependabot[bot]
closed
6 months ago
2
3.3: Learnable Sink Token
#38
photomz
opened
6 months ago
1
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#37
pseudotensor
opened
6 months ago
8
Error when using Qwen 7b chat
#36
Minami-su
opened
7 months ago
1
Error loading Qwen-1_8B
#35
haiphong93
opened
7 months ago
0
Generation stop;torch.cuda.OutOfMemoryError: CUDA out of memory.
#34
Essence9999
opened
7 months ago
0
Update QWen due to changes in the modeling files of QWen-7b
#33
tomaarsen
closed
7 months ago
0
ValueError: Attention Sinks does not support Flash Attention in QWen models, please use `use_flash_attn=False` in `AutoModelForCausalLM.from_pretrained`.
#32
Essence9999
closed
7 months ago
4
GPTQ models support
#31
synacktraa
opened
7 months ago
5
Flash Attention Support
#30
Jiayuanhip
opened
7 months ago
1
Add BTLM support + benchmark results
#29
tomaarsen
closed
7 months ago
0
Questions Related to the Application and Results of Attention Sinks After the Paper
#28
dsdanielpark
closed
8 months ago
2
Add Yi support + benchmark results
#27
MekkCyber
closed
7 months ago
4
Avoid overly strict "transformers==4.34.0",
#26
pseudotensor
opened
8 months ago
2
Add exception for when FA is used with QWen
#25
tomaarsen
closed
8 months ago
0
Error when using Qwen-14B
#24
sun1092469590
opened
8 months ago
16
Shrink attention_mask if it's larger than the cache
#23
tomaarsen
closed
8 months ago
4
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
#22
pseudotensor
closed
8 months ago
10
Bigcode architecture
#21
selimsandal
closed
8 months ago
1
Add support for StableLM 3b 4e1t model
#20
kmn1024
closed
8 months ago
1
Strategy for trust_remote_code?
#19
kmn1024
closed
8 months ago
1
The results of sink/transformer/windowed under outputs_*/ folders are all the same
#18
ZiweiHe
closed
8 months ago
3
Issue with only adding sink tokens in cache
#17
sam1373
opened
9 months ago
4
Completely refactor injection code
#16
tomaarsen
closed
9 months ago
0
Add QWen model + benchmark results
#15
Sanster
closed
8 months ago
6
Experiments with MPT7b with seqlen > 2048
#14
vchiley
opened
9 months ago
4
Add GPT-J support + benchmark results
#13
tomaarsen
closed
9 months ago
0
Error when importing
#12
Caet-pip
closed
8 months ago
1
Add support for GPT-J models
#11
versae
closed
9 months ago
2
Add benchmarks comparing against Sliding Window Attention
#10
casper-hansen
opened
9 months ago
1
Add cotributing.md
#9
rajveer43
opened
9 months ago
0
Error when using Falcon
#8
helleuch
closed
9 months ago
3
Use with `pipeline` or `generate`
#7
helleuch
closed
9 months ago
2
Add `model.generate` support
#6
tomaarsen
closed
9 months ago
3
Add Mistral support + benchmark results
#5
tomaarsen
closed
9 months ago
0
Add GPT-NeoX/Pythia support + benchmark results
#4
tomaarsen
closed
9 months ago
0
Add MPT support + benchmark results
#3
tomaarsen
closed
9 months ago
0
Add Falcon support + benchmark results
#2
tomaarsen
closed
9 months ago
0
Trying a minimal example with LlamaForCasualLM, sadly it fails
#1
alexbalandi
closed
9 months ago
16