pytorch-labs attention-gym issues

pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention

BSD 3-Clause "New" or "Revised" License

475 stars 23 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Examples of training bias

#84 drisspg opened 1 hour ago
0
Weird warning on compile: `SingleProcess AUTOTUNE benchmarking takes...`

#83 ViktorooReps closed 2 days ago
3
Support for Group Query Attention

#82 tugot17 closed 3 days ago
0
flexattn with qwen2

#81 NonvolatileMemory opened 4 days ago
4
What is the best way to deal with padding and cached KV?

#80 ViktorooReps closed 3 days ago
2
Weird benchmarking results: FlexAttention vs SDPA

#79 ViktorooReps closed 5 days ago
1
FlexAttention BlockMask creation in DataLoader

#78 ViktorooReps closed 6 days ago
3
Flex attention with dropout

#77 zbh2047 opened 1 week ago
3
Flex attention - gaps in profiler

#76 tugot17 opened 1 week ago
7
Rope2d

#75 bhack opened 1 week ago
7
How to implement Bidirectional Alibi with padding using flex attention?

#74 sphmel opened 2 weeks ago
2
Is there any chance to call backward function dircetly instead of using pytorch autograd mechanism?

#73 MayDomine opened 2 weeks ago
3
;

#72 Sierkinhane closed 2 weeks ago
0
Block Size when Q_LEN and KV_LEN are different

#71 johng149 opened 2 weeks ago
0
NotImplementedError: There was no rule registered for HOP flex_attention and mode

#70 LeoXinhaoLee opened 2 weeks ago
2
AssertionError: Captured buffers that require grad are not yet supported.

#69 pengzhenghao closed 1 day ago
1
Document masking does not work for small number of tokens?

#68 xidulu closed 3 weeks ago
6
Test with random cross attention

#67 ssmmnn11 closed 3 weeks ago
1
How to manually check if one position or row has correct masking?

#66 Leo-T-Zang opened 3 weeks ago
2
Selection of BLOCK_SIZE in create_block_mask

#65 tsrikris opened 1 month ago
1
Fix typo in readme

#64 drisspg closed 1 month ago
0
How to reason about efficiency of different score/mask mod functions

#63 alex-hh opened 1 month ago
3
FlexAttention Output Differs from SDPA

#62 chayut-t opened 1 month ago
4
Fix Format

#61 drisspg closed 1 month ago
0
How to do KV Cache with FlexAttention and BlockMask by slicing?

#60 Leo-T-Zang opened 1 month ago
4
A simple adaption to Jax

#59 zinccat opened 1 month ago
3
What is the best practice to save and load a BlockMask object?

#58 complexfilter opened 1 month ago
1
Fix minor typo in example

#57 zinccat closed 1 month ago
0
Optimal ordering with block mask

#56 francois-rozet opened 1 month ago
9
Typo in README example

#55 francois-rozet closed 1 month ago
2
What is the expected gpu memory performance drop wrt flash attention with block masks?

#54 arilato opened 1 month ago
2
CheckpointFunction with Flex-attn BlockMask

#53 StanLei52 closed 1 month ago
2
Fix import of transformmodindex on nightly

#52 drisspg closed 1 month ago
0
[FlexAttention] Using FlexAttention with DDP complains about a "higher order optimizer"

#51 moinnadeem closed 1 month ago
1
FlexAttention results do not match FlashAttention results

#50 tilmto opened 1 month ago
3
Performance dependent on GPU type?

#49 alex-hh closed 3 weeks ago
2
Add explicit statement in README.md for installing the required version of PyTorch

#48 sachinkadyan7 closed 1 month ago
3
Add explicit code statement in README for installing PyTorch 2.5 (currently only available as nightly)

#47 sachinkadyan7 closed 1 month ago
2
Replacing attention implementation with FlexAttention seems to break Llama3 inference

#46 kyleliang919 closed 1 month ago
6
Two errors: (1) NameError: ModularIndexing is not defined & (2) LoweringException: AttributeError: 'View' object has no attribute 'get_stride'

#45 tobiasvanderwerff opened 2 months ago
10
Distributed Attention Methods

#44 tsrikris opened 2 months ago
2
CUDA OOM Issue When Using Approx Tanh with softcapping score mod

#43 kebijuelun opened 2 months ago
5
[Feature request] End-to-end transformer example with flex attention

#42 vladkvit opened 2 months ago
1
Question about FlexAttention for Tabular Data

#41 RaphaelMouravieff closed 2 months ago
0
Creating block mask with mask mod and _compile=True

#40 johng149 closed 1 month ago
3
[Feature request] how to merge Blockmask?

#39 foreverpiano closed 1 month ago
2
Padding mask for BERT

#38 kchl5 closed 2 months ago
7
Returned lse elements are all 0s

#37 311dada closed 2 months ago
3
torch.nn.attention folder seems to be missing on torch 2.5

#36 kchl5 closed 2 months ago
3
Clarification on torch.compile behavior with flex_attention

#35 kebijuelun closed 2 months ago
1