issues
search
pytorch-labs
/
tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
BSD 3-Clause "New" or "Revised" License
21
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Install tritonbench as a library
#81
xuzhao9
opened
6 hours ago
0
changing hw rooflines to match xformers
#80
adamomainz
closed
5 hours ago
4
Build the nightly workflow
#79
xuzhao9
opened
9 hours ago
0
[GPU_UTILS] H100 specs
#78
antferdom
closed
2 hours ago
6
[FA] add persistent variant
#77
manman-ren
opened
1 day ago
2
Ops bug fix and args clean
#76
FindHao
closed
1 day ago
3
Fix nsys when running multiple ops
#75
xuzhao9
closed
4 days ago
2
Enable bwd for flash_attention
#74
xuzhao9
closed
1 day ago
8
quick fix to continue with issue 71
#73
adamomainz
closed
4 days ago
4
Fix the PR CI
#72
xuzhao9
closed
4 days ago
2
Not able to run fp8_gemm_rowwise
#71
karthik-man
closed
4 days ago
13
Fix the PR CI errors
#70
xuzhao9
closed
4 days ago
6
[FA] fix an assertion failure due to refactoring in PR54
#69
manman-ren
closed
4 days ago
3
Need post-run statistics
#68
FindHao
opened
5 days ago
0
We need to add support for string lists as metrics
#67
FindHao
opened
5 days ago
0
Rename mem_footprint to mem_footprint_compression_ratio
#66
FindHao
closed
5 days ago
2
Add nsys report analyzer
#65
FindHao
opened
5 days ago
9
Install rocm nightly
#64
xuzhao9
closed
5 days ago
3
Test on both pytorch-triton and triton-main
#63
xuzhao9
closed
5 days ago
2
Support sparsity, target-size and sort_by_length for hstu
#62
manman-ren
closed
5 days ago
5
Patch xformers to enable FA3 extension
#61
xuzhao9
closed
5 days ago
2
Install patch in the docker
#60
xuzhao9
closed
5 days ago
3
Update hstu and fix ragged attn
#59
xuzhao9
closed
6 days ago
2
Fix backends in flash_attention and gemm
#58
xuzhao9
closed
6 days ago
3
Use code detection to check bwd method override.
#57
xuzhao9
closed
6 days ago
4
Enable gemm and more operators in the CI
#56
xuzhao9
closed
1 week ago
3
Fix CI test failures
#55
xuzhao9
closed
1 week ago
2
[FA] clean up and make TMA, scheduling autotunable
#54
manman-ren
closed
1 week ago
2
Update HSTU and use the OSS wrapper for non-persisent kernels
#53
xuzhao9
closed
1 week ago
2
Fix a missing attribute issue with FP8 rowwise gemm
#52
htyu
closed
1 week ago
2
update fbgemm to a3da0f3eb84ecc48ff0e445e4df82cd2603862b0
#51
htyu
closed
1 week ago
2
Improve latency measurement
#50
xuzhao9
opened
1 week ago
2
Build and publish the rocm nightly docker
#49
xuzhao9
closed
5 days ago
0
Add an init file to tools so that it is considered as an module.
#48
htyu
closed
1 week ago
2
Add ufmt linter for pyproject
#47
xuzhao9
closed
2 weeks ago
3
Fix hstu on OSS
#46
xuzhao9
closed
2 weeks ago
2
Add WarpSpec version for Flash Attention
#45
manman-ren
closed
2 weeks ago
5
Rename mem_footprint to mem_footprint_compression_ratio for Clarity
#44
FindHao
closed
5 days ago
0
Add autotune mode to liger kernels
#43
FindHao
opened
2 weeks ago
1
Add ncu_tflops
#42
FindHao
closed
2 weeks ago
3
[performance] Torch SDPA cuDNN backend vs FlashAttention v3
#41
antferdom
opened
2 weeks ago
10
layer_norm backward problem
#40
FindHao
opened
2 weeks ago
2
Fix the docker build
#39
xuzhao9
closed
2 weeks ago
2
Add nightly benchmarking on Triton pytorch and triton-main versions
#38
xuzhao9
opened
3 weeks ago
0
Update AI computation
#37
FindHao
closed
3 weeks ago
4
Add transformers to dependency and pin its version
#36
FindHao
closed
3 weeks ago
3
Format benchmark function names and change x_val to corresponding input shapes
#35
FindHao
closed
3 weeks ago
3
Add peak memory usage and footprint measurement
#34
FindHao
closed
3 weeks ago
4
Need general flops metric from ncu report
#33
FindHao
closed
2 weeks ago
9
Add layernorm and fix bug for embedding bwd
#32
FindHao
closed
3 weeks ago
2
Next