tomaarsen attention_sinks issues

tomaarsen / attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

https://huggingface.co/blog/tomaarsen/attention-sinks

Apache License 2.0

649 stars 41 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Bump transformers from 4.34.0 to 4.38.0

#46 dependabot[bot] opened 3 months ago
0
Last generated token getting ignored in streaming.py?

#45 ritik99 opened 3 months ago
0
Trying to install via Kaggle

#44 Kuchiriel closed 5 months ago
1
TypeError: 'NoneType' object is not subscriptable

#43 Kuchiriel opened 5 months ago
0
Support AutoGPTQ

#42 Minami-su opened 6 months ago
0
Support newer versions of mistral (e.g. mistralai/Mistral-7B-Instruct-v0.2)?

#41 spring1915 opened 6 months ago
2
chatglm3 support?

#40 ScottishFold007 opened 6 months ago
0
Bump transformers from 4.34.0 to 4.36.0

#39 dependabot[bot] closed 6 months ago
2
3.3: Learnable Sink Token

#38 photomz opened 6 months ago
1
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

#37 pseudotensor opened 6 months ago
8
Error when using Qwen 7b chat

#36 Minami-su opened 7 months ago
1
Error loading Qwen-1_8B

#35 haiphong93 opened 7 months ago
0
Generation stop；torch.cuda.OutOfMemoryError: CUDA out of memory.

#34 Essence9999 opened 7 months ago
0
Update QWen due to changes in the modeling files of QWen-7b

#33 tomaarsen closed 7 months ago
0
ValueError: Attention Sinks does not support Flash Attention in QWen models, please use `use_flash_attn=False` in `AutoModelForCausalLM.from_pretrained`.

#32 Essence9999 closed 7 months ago
4
GPTQ models support

#31 synacktraa opened 7 months ago
5
Flash Attention Support

#30 Jiayuanhip opened 7 months ago
1
Add BTLM support + benchmark results

#29 tomaarsen closed 7 months ago
0
Questions Related to the Application and Results of Attention Sinks After the Paper

#28 dsdanielpark closed 8 months ago
2
Add Yi support + benchmark results

#27 MekkCyber closed 7 months ago
4
Avoid overly strict "transformers==4.34.0",

#26 pseudotensor opened 8 months ago
2
Add exception for when FA is used with QWen

#25 tomaarsen closed 8 months ago
0
Error when using Qwen-14B

#24 sun1092469590 opened 8 months ago
16
Shrink attention_mask if it's larger than the cache

#23 tomaarsen closed 8 months ago
4
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

#22 pseudotensor closed 8 months ago
10
Bigcode architecture

#21 selimsandal closed 8 months ago
1
Add support for StableLM 3b 4e1t model

#20 kmn1024 closed 8 months ago
1
Strategy for trust_remote_code?

#19 kmn1024 closed 8 months ago
1
The results of sink/transformer/windowed under outputs_*/ folders are all the same

#18 ZiweiHe closed 8 months ago
3
Issue with only adding sink tokens in cache

#17 sam1373 opened 9 months ago
4
Completely refactor injection code

#16 tomaarsen closed 9 months ago
0
Add QWen model + benchmark results

#15 Sanster closed 8 months ago
6
Experiments with MPT7b with seqlen > 2048

#14 vchiley opened 9 months ago
4
Add GPT-J support + benchmark results

#13 tomaarsen closed 9 months ago
0
Error when importing

#12 Caet-pip closed 8 months ago
1
Add support for GPT-J models

#11 versae closed 9 months ago
2
Add benchmarks comparing against Sliding Window Attention

#10 casper-hansen opened 9 months ago
1
Add cotributing.md

#9 rajveer43 opened 9 months ago
0
Error when using Falcon

#8 helleuch closed 9 months ago
3
Use with `pipeline` or `generate`

#7 helleuch closed 9 months ago
2
Add `model.generate` support

#6 tomaarsen closed 9 months ago
3
Add Mistral support + benchmark results

#5 tomaarsen closed 9 months ago
0
Add GPT-NeoX/Pythia support + benchmark results

#4 tomaarsen closed 9 months ago
0
Add MPT support + benchmark results

#3 tomaarsen closed 9 months ago
0
Add Falcon support + benchmark results

#2 tomaarsen closed 9 months ago
0
Trying a minimal example with LlamaForCasualLM, sadly it fails

#1 alexbalandi closed 9 months ago
16