pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.17k stars 474 forks source link

Decode and Prefill support #3009

Closed Aya-ZIbra closed 3 weeks ago

Aya-ZIbra commented 3 weeks ago

Summary: This diff adds support for Triton-splitk kernel. Includes: 1/ prefill_varseq_attn and decode_attn 2/ dequantize kernel 3/ fused quantization in rope functions

TODO: Dequantize + paged kv cache

Differential Revision: D60747287

netlify[bot] commented 3 weeks ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 0f4a11fbb9ff01617a54136a9a3745eb2b6541c2
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66c1c56e2325cc00083ec2dd
Deploy Preview https://deploy-preview-3009--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D60747287

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D60747287

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D60747287

facebook-github-bot commented 3 weeks ago

This pull request has been merged in pytorch/FBGEMM@6bd8cd7d1df4a001e476f38fa5e81104e77d39a6.