issues
search
pytorch-labs
/
attention-gym
Helpful tools and examples for working with flex-attention
BSD 3-Clause "New" or "Revised" License
433
stars
22
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Selection of BLOCK_SIZE in create_block_mask
#65
tsrikris
opened
4 days ago
1
Fix typo in readme
#64
drisspg
closed
4 days ago
0
How to reason about efficiency of different score/mask mod functions
#63
alex-hh
opened
5 days ago
2
FlexAttention Output Differs from SDPA
#62
chayut-t
opened
5 days ago
0
Fix Format
#61
drisspg
closed
6 days ago
0
How to do KV Cache with FlexAttention and BlockMask by slicing?
#60
Leo-T-Zang
opened
1 week ago
3
A simple adaption to Jax
#59
zinccat
opened
1 week ago
3
What is the best practice to save and load a BlockMask object?
#58
complexfilter
opened
1 week ago
1
Fix minor typo in example
#57
zinccat
closed
6 days ago
0
Optimal ordering with block mask
#56
francois-rozet
opened
1 week ago
8
Typo in README example
#55
francois-rozet
closed
4 days ago
2
What is the expected gpu memory performance drop wrt flash attention with block masks?
#54
arilato
opened
1 week ago
2
CheckpointFunction with Flex-attn BlockMask
#53
StanLei52
closed
1 week ago
2
Fix import of transformmodindex on nightly
#52
drisspg
closed
1 week ago
0
[FlexAttention] Using FlexAttention with DDP complains about a "higher order optimizer"
#51
moinnadeem
closed
2 weeks ago
1
FlexAttention results do not match FlashAttention results
#50
tilmto
opened
3 weeks ago
3
Performance dependent on GPU type?
#49
alex-hh
opened
3 weeks ago
2
Add explicit statement in README.md for installing the required version of PyTorch
#48
sachinkadyan7
closed
4 weeks ago
3
Add explicit code statement in README for installing PyTorch 2.5 (currently only available as nightly)
#47
sachinkadyan7
closed
4 weeks ago
2
Replacing attention implementation with FlexAttention seems to break Llama3 inference
#46
kyleliang919
closed
1 month ago
6
Two errors: (1) NameError: ModularIndexing is not defined & (2) LoweringException: AttributeError: 'View' object has no attribute 'get_stride'
#45
tobiasvanderwerff
opened
1 month ago
10
Distributed Attention Methods
#44
tsrikris
opened
1 month ago
1
CUDA OOM Issue When Using Approx Tanh with softcapping score mod
#43
kebijuelun
opened
1 month ago
3
[Feature request] End-to-end transformer example with flex attention
#42
vladkvit
opened
1 month ago
1
Question about FlexAttention for Tabular Data
#41
RaphaelMouravieff
closed
1 month ago
0
Creating block mask with mask mod and _compile=True
#40
johng149
closed
4 weeks ago
3
[Feature request] how to merge Blockmask?
#39
foreverpiano
closed
3 weeks ago
2
Padding mask for BERT
#38
kchl5
closed
1 month ago
7
Returned lse elements are all 0s
#37
311dada
closed
1 month ago
3
torch.nn.attention folder seems to be missing on torch 2.5
#36
kchl5
closed
1 month ago
3
Clarification on torch.compile behavior with flex_attention
#35
kebijuelun
closed
1 month ago
1
How to avoid re-compute mask
#34
NonvolatileMemory
opened
1 month ago
6
Dynamic shape compilation support for flex attention with block mask
#33
SamGalanakis
opened
2 months ago
1
Question about OOm on large sequences
#32
foreverpiano
closed
1 month ago
8
Support varied input sequence lengths with a fixed block mask
#31
tilmto
opened
2 months ago
5
Integration with Hugginface Transformers
#30
buaacyw
closed
2 months ago
1
It seems that `visualize_attention_scores` can only visualize either mask-mod-only or score-mod-only
#29
XinDongol
opened
2 months ago
2
when using the same mask and same query/key/value.shape, how to fix the kernel instead of recompiling the flexattention?
#28
foreverpiano
closed
2 months ago
4
why in the attention-gym, flex-attention runs faster than FA2; however, in real environment, it runs too slower than FA2?
#27
foreverpiano
closed
2 months ago
2
`error: 'tt.broadcast' op requires the same encoding for all operands and results` for local window attention
#26
fteufel
opened
2 months ago
15
Does FlexAttention Support torch.vmap?
#25
MiladInk
opened
2 months ago
3
Use hatch vcs for versioning
#24
drisspg
closed
2 months ago
0
Update softcapping.py
#23
Chillee
closed
2 months ago
0
[flex_attention] Softcap perf questions
#22
meshtag
opened
2 months ago
6
V100 GPUs supported ?
#21
boren-ms
opened
2 months ago
5
Bias gradient support?
#20
ardagoreci
opened
2 months ago
8
Writing to a globally scoped tensor from score_mod function
#19
jeffwillette
opened
2 months ago
1
Paged attention
#18
kme2698
opened
2 months ago
1
Thank you for awesome work! I saw from the blog that paged attention can also be implemented with flex attention.
#17
kme2698
closed
2 months ago
0
NATTEN example
#16
Birch-san
closed
2 months ago
0
Next