pytorch-labs / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
BSD 3-Clause "New" or "Revised" License
20 stars 3 forks source link

Support sparsity, target-size and sort_by_length for hstu #62

Closed manman-ren closed 1 day ago

manman-ren commented 2 days ago

Copied over generate_sparse_seq_len Example output x_val hstu_triton_ragged_attention-latency


(256, 4, 16384, 2048, 0.8, 20, False) 146.458 (256, 4, 16384, 2048, 0.8, 20, False) 148.616 (256, 4, 16384, 2048, 0.8, 20, False) 145.135 (256, 4, 16384, 2048, 0.8, 20, False) 148.98 (256, 4, 16384, 2048, 0.8, 20, False) 147.167 (256, 4, 16384, 2048, 0.8, 20, False) 146.155 (256, 4, 16384, 2048, 0.8, 20, False) 144.787 (256, 4, 16384, 2048, 0.8, 20, False) 144.055 (256, 4, 16384, 2048, 0.8, 20, False) 144.35 (256, 4, 16384, 2048, 0.8, 20, False) 146.67

facebook-github-bot commented 2 days ago

@manman-ren has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xuzhao9 commented 2 days ago

Can we have example output from running the operator?

facebook-github-bot commented 1 day ago

@manman-ren has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 day ago

@manman-ren has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 day ago

@manman-ren merged this pull request in pytorch-labs/tritonbench@45d195cc2d7bf1987c9dcc7ecf7f7989c5b035d9.