pytorch-labs tritonbench issues

pytorch-labs / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

BSD 3-Clause "New" or "Revised" License

21 stars 3 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Install tritonbench as a library

#81 xuzhao9 opened 6 hours ago
0
changing hw rooflines to match xformers

#80 adamomainz closed 5 hours ago
4
Build the nightly workflow

#79 xuzhao9 opened 9 hours ago
0
[GPU_UTILS] H100 specs

#78 antferdom closed 2 hours ago
6
[FA] add persistent variant

#77 manman-ren opened 1 day ago
2
Ops bug fix and args clean

#76 FindHao closed 1 day ago
3
Fix nsys when running multiple ops

#75 xuzhao9 closed 4 days ago
2
Enable bwd for flash_attention

#74 xuzhao9 closed 1 day ago
8
quick fix to continue with issue 71

#73 adamomainz closed 4 days ago
4
Fix the PR CI

#72 xuzhao9 closed 4 days ago
2
Not able to run fp8_gemm_rowwise

#71 karthik-man closed 4 days ago
13
Fix the PR CI errors

#70 xuzhao9 closed 4 days ago
6
[FA] fix an assertion failure due to refactoring in PR54

#69 manman-ren closed 4 days ago
3
Need post-run statistics

#68 FindHao opened 5 days ago
0
We need to add support for string lists as metrics

#67 FindHao opened 5 days ago
0
Rename mem_footprint to mem_footprint_compression_ratio

#66 FindHao closed 5 days ago
2
Add nsys report analyzer

#65 FindHao opened 5 days ago
9
Install rocm nightly

#64 xuzhao9 closed 5 days ago
3
Test on both pytorch-triton and triton-main

#63 xuzhao9 closed 5 days ago
2
Support sparsity, target-size and sort_by_length for hstu

#62 manman-ren closed 5 days ago
5
Patch xformers to enable FA3 extension

#61 xuzhao9 closed 5 days ago
2
Install patch in the docker

#60 xuzhao9 closed 5 days ago
3
Update hstu and fix ragged attn

#59 xuzhao9 closed 6 days ago
2
Fix backends in flash_attention and gemm

#58 xuzhao9 closed 6 days ago
3
Use code detection to check bwd method override.

#57 xuzhao9 closed 6 days ago
4
Enable gemm and more operators in the CI

#56 xuzhao9 closed 1 week ago
3
Fix CI test failures

#55 xuzhao9 closed 1 week ago
2
[FA] clean up and make TMA, scheduling autotunable

#54 manman-ren closed 1 week ago
2
Update HSTU and use the OSS wrapper for non-persisent kernels

#53 xuzhao9 closed 1 week ago
2
Fix a missing attribute issue with FP8 rowwise gemm

#52 htyu closed 1 week ago
2
update fbgemm to a3da0f3eb84ecc48ff0e445e4df82cd2603862b0

#51 htyu closed 1 week ago
2
Improve latency measurement

#50 xuzhao9 opened 1 week ago
2
Build and publish the rocm nightly docker

#49 xuzhao9 closed 5 days ago
0
Add an init file to tools so that it is considered as an module.

#48 htyu closed 1 week ago
2
Add ufmt linter for pyproject

#47 xuzhao9 closed 2 weeks ago
3
Fix hstu on OSS

#46 xuzhao9 closed 2 weeks ago
2
Add WarpSpec version for Flash Attention

#45 manman-ren closed 2 weeks ago
5
Rename mem_footprint to mem_footprint_compression_ratio for Clarity

#44 FindHao closed 5 days ago
0
Add autotune mode to liger kernels

#43 FindHao opened 2 weeks ago
1
Add ncu_tflops

#42 FindHao closed 2 weeks ago
3
[performance] Torch SDPA cuDNN backend vs FlashAttention v3

#41 antferdom opened 2 weeks ago
10
layer_norm backward problem

#40 FindHao opened 2 weeks ago
2
Fix the docker build

#39 xuzhao9 closed 2 weeks ago
2
Add nightly benchmarking on Triton pytorch and triton-main versions

#38 xuzhao9 opened 3 weeks ago
0
Update AI computation

#37 FindHao closed 3 weeks ago
4
Add transformers to dependency and pin its version

#36 FindHao closed 3 weeks ago
3
Format benchmark function names and change x_val to corresponding input shapes

#35 FindHao closed 3 weeks ago
3
Add peak memory usage and footprint measurement

#34 FindHao closed 3 weeks ago
4
Need general flops metric from ncu report

#33 FindHao closed 2 weeks ago
9
Add layernorm and fix bug for embedding bwd

#32 FindHao closed 3 weeks ago
2