issues
search
pytorch-labs
/
attention-gym
Helpful tools and examples for working with flex-attention
BSD 3-Clause "New" or "Revised" License
484
stars
24
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Clarification on torch.compile behavior with flex_attention
#35
kebijuelun
closed
2 months ago
1
How to avoid re-compute mask
#34
NonvolatileMemory
opened
2 months ago
7
Dynamic shape compilation support for flex attention with block mask
#33
SamGalanakis
opened
3 months ago
1
Question about OOm on large sequences
#32
foreverpiano
closed
2 months ago
8
Support varied input sequence lengths with a fixed block mask
#31
tilmto
opened
3 months ago
5
Integration with Hugginface Transformers
#30
buaacyw
closed
2 months ago
1
It seems that `visualize_attention_scores` can only visualize either mask-mod-only or score-mod-only
#29
XinDongol
opened
3 months ago
2
when using the same mask and same query/key/value.shape, how to fix the kernel instead of recompiling the flexattention?
#28
foreverpiano
closed
3 months ago
4
why in the attention-gym, flex-attention runs faster than FA2; however, in real environment, it runs too slower than FA2?
#27
foreverpiano
closed
3 months ago
2
`error: 'tt.broadcast' op requires the same encoding for all operands and results` for local window attention
#26
fteufel
opened
3 months ago
15
Does FlexAttention Support torch.vmap?
#25
MiladInk
opened
3 months ago
3
Use hatch vcs for versioning
#24
drisspg
closed
3 months ago
0
Update softcapping.py
#23
Chillee
closed
3 months ago
0
[flex_attention] Softcap perf questions
#22
meshtag
opened
3 months ago
6
V100 GPUs supported ?
#21
boren-ms
opened
3 months ago
5
Bias gradient support?
#20
ardagoreci
opened
3 months ago
11
Writing to a globally scoped tensor from score_mod function
#19
jeffwillette
opened
3 months ago
1
Paged attention
#18
kme2698
opened
3 months ago
1
Thank you for awesome work! I saw from the blog that paged attention can also be implemented with flex attention.
#17
kme2698
closed
3 months ago
0
NATTEN example
#16
Birch-san
closed
3 months ago
0
CUDA OOM with sliding window attention
#15
mishooax
closed
3 months ago
6
Shared memory out of resource
#14
TechxGenus
opened
3 months ago
2
Fix a typo
#13
sangyeon-k
closed
3 months ago
0
Add Dilated Sliding Window mask_mod
#12
sangyeon-k
closed
2 days ago
2
2D NATTEN Typo?
#11
zaptrem
closed
3 months ago
1
[question] Possible to implement Attention Steering?
#10
GindaChen
opened
3 months ago
3
Compatibility with AMD gpus
#9
vinay-swamy
opened
3 months ago
1
Add Graphormer mod
#8
stsouko
opened
3 months ago
1
Remove gh release
#7
drisspg
closed
3 months ago
0
add pypi tag publish flow
#6
drisspg
closed
3 months ago
0
Publish on PyPI?
#5
Ryu1845
closed
3 months ago
2
Fix link in README.md
#4
zinccat
closed
3 months ago
0
Add Global + Sliding Window
#3
drisspg
opened
3 months ago
0
Add Dilated Sliding Window mask_mod
#2
drisspg
opened
3 months ago
0
Add Sandwich Score Mod
#1
drisspg
opened
3 months ago
0
Previous