issues
search
pytorch-labs
/
attention-gym
Helpful tools and examples for working with flex-attention
BSD 3-Clause "New" or "Revised" License
475
stars
23
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Examples of training bias
#84
drisspg
opened
1 hour ago
0
Weird warning on compile: `SingleProcess AUTOTUNE benchmarking takes...`
#83
ViktorooReps
closed
2 days ago
3
Support for Group Query Attention
#82
tugot17
closed
3 days ago
0
flexattn with qwen2
#81
NonvolatileMemory
opened
4 days ago
4
What is the best way to deal with padding and cached KV?
#80
ViktorooReps
closed
3 days ago
2
Weird benchmarking results: FlexAttention vs SDPA
#79
ViktorooReps
closed
5 days ago
1
FlexAttention BlockMask creation in DataLoader
#78
ViktorooReps
closed
6 days ago
3
Flex attention with dropout
#77
zbh2047
opened
1 week ago
3
Flex attention - gaps in profiler
#76
tugot17
opened
1 week ago
7
Rope2d
#75
bhack
opened
1 week ago
7
How to implement Bidirectional Alibi with padding using flex attention?
#74
sphmel
opened
2 weeks ago
2
Is there any chance to call backward function dircetly instead of using pytorch autograd mechanism?
#73
MayDomine
opened
2 weeks ago
3
;
#72
Sierkinhane
closed
2 weeks ago
0
Block Size when Q_LEN and KV_LEN are different
#71
johng149
opened
2 weeks ago
0
NotImplementedError: There was no rule registered for HOP flex_attention and mode
#70
LeoXinhaoLee
opened
2 weeks ago
2
AssertionError: Captured buffers that require grad are not yet supported.
#69
pengzhenghao
closed
1 day ago
1
Document masking does not work for small number of tokens?
#68
xidulu
closed
3 weeks ago
6
Test with random cross attention
#67
ssmmnn11
closed
3 weeks ago
1
How to manually check if one position or row has correct masking?
#66
Leo-T-Zang
opened
3 weeks ago
2
Selection of BLOCK_SIZE in create_block_mask
#65
tsrikris
opened
1 month ago
1
Fix typo in readme
#64
drisspg
closed
1 month ago
0
How to reason about efficiency of different score/mask mod functions
#63
alex-hh
opened
1 month ago
3
FlexAttention Output Differs from SDPA
#62
chayut-t
opened
1 month ago
4
Fix Format
#61
drisspg
closed
1 month ago
0
How to do KV Cache with FlexAttention and BlockMask by slicing?
#60
Leo-T-Zang
opened
1 month ago
4
A simple adaption to Jax
#59
zinccat
opened
1 month ago
3
What is the best practice to save and load a BlockMask object?
#58
complexfilter
opened
1 month ago
1
Fix minor typo in example
#57
zinccat
closed
1 month ago
0
Optimal ordering with block mask
#56
francois-rozet
opened
1 month ago
9
Typo in README example
#55
francois-rozet
closed
1 month ago
2
What is the expected gpu memory performance drop wrt flash attention with block masks?
#54
arilato
opened
1 month ago
2
CheckpointFunction with Flex-attn BlockMask
#53
StanLei52
closed
1 month ago
2
Fix import of transformmodindex on nightly
#52
drisspg
closed
1 month ago
0
[FlexAttention] Using FlexAttention with DDP complains about a "higher order optimizer"
#51
moinnadeem
closed
1 month ago
1
FlexAttention results do not match FlashAttention results
#50
tilmto
opened
1 month ago
3
Performance dependent on GPU type?
#49
alex-hh
closed
3 weeks ago
2
Add explicit statement in README.md for installing the required version of PyTorch
#48
sachinkadyan7
closed
1 month ago
3
Add explicit code statement in README for installing PyTorch 2.5 (currently only available as nightly)
#47
sachinkadyan7
closed
1 month ago
2
Replacing attention implementation with FlexAttention seems to break Llama3 inference
#46
kyleliang919
closed
1 month ago
6
Two errors: (1) NameError: ModularIndexing is not defined & (2) LoweringException: AttributeError: 'View' object has no attribute 'get_stride'
#45
tobiasvanderwerff
opened
2 months ago
10
Distributed Attention Methods
#44
tsrikris
opened
2 months ago
2
CUDA OOM Issue When Using Approx Tanh with softcapping score mod
#43
kebijuelun
opened
2 months ago
5
[Feature request] End-to-end transformer example with flex attention
#42
vladkvit
opened
2 months ago
1
Question about FlexAttention for Tabular Data
#41
RaphaelMouravieff
closed
2 months ago
0
Creating block mask with mask mod and _compile=True
#40
johng149
closed
1 month ago
3
[Feature request] how to merge Blockmask?
#39
foreverpiano
closed
1 month ago
2
Padding mask for BERT
#38
kchl5
closed
2 months ago
7
Returned lse elements are all 0s
#37
311dada
closed
2 months ago
3
torch.nn.attention folder seems to be missing on torch 2.5
#36
kchl5
closed
2 months ago
3
Clarification on torch.compile behavior with flex_attention
#35
kebijuelun
closed
2 months ago
1
Next