Decode and Prefill support

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Other

1.17k stars 474 forks source link

Decode and Prefill support #3009

Closed Aya-ZIbra closed 3 weeks ago

Aya-ZIbra commented 3 weeks ago

Summary: This diff adds support for Triton-splitk kernel. Includes: 1/ prefill_varseq_attn and decode_attn 2/ dequantize kernel 3/ fused quantization in rope functions

TODO: Dequantize + paged kv cache

Differential Revision: D60747287

netlify[bot] commented 3 weeks ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
Latest commit	0f4a11fbb9ff01617a54136a9a3745eb2b6541c2
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66c1c56e2325cc00083ec2dd
Deploy Preview	https://deploy-preview-3009--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D60747287

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D60747287

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D60747287

facebook-github-bot commented 3 weeks ago

This pull request has been merged in pytorch/FBGEMM@6bd8cd7d1df4a001e476f38fa5e81104e77d39a6.

pytorch / FBGEMM

Decode and Prefill support #3009

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Deploy Preview for pytorch-fbgemm-docs ready!