Closed manman-ren closed 1 day ago
@manman-ren has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Can we have example output from running the operator?
@manman-ren has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@manman-ren has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@manman-ren merged this pull request in pytorch-labs/tritonbench@45d195cc2d7bf1987c9dcc7ecf7f7989c5b035d9.
Copied over generate_sparse_seq_len Example output x_val hstu_triton_ragged_attention-latency
(256, 4, 16384, 2048, 0.8, 20, False) 146.458 (256, 4, 16384, 2048, 0.8, 20, False) 148.616 (256, 4, 16384, 2048, 0.8, 20, False) 145.135 (256, 4, 16384, 2048, 0.8, 20, False) 148.98 (256, 4, 16384, 2048, 0.8, 20, False) 147.167 (256, 4, 16384, 2048, 0.8, 20, False) 146.155 (256, 4, 16384, 2048, 0.8, 20, False) 144.787 (256, 4, 16384, 2048, 0.8, 20, False) 144.055 (256, 4, 16384, 2048, 0.8, 20, False) 144.35 (256, 4, 16384, 2048, 0.8, 20, False) 146.67