pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.18k stars 486 forks source link

move memory copy into one_shot_all_reduce #2770

Closed xw285cornell closed 3 months ago

xw285cornell commented 3 months ago

Summary: Avoid latency of launching hipMemcpyAsync. Could see 3-4us reduction in benchmarking. Also see improvements in end to end testing.

Moved from #2693 to fix some formatting issue. Thanks @wenkaidu for contributing.

Reviewed By: sryap, jianyuh

Differential Revision: D58223358

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D58223358

netlify[bot] commented 3 months ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 3b52730e2f18e1478d84379fbd35b59101673b4b
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/667638a4cd502500086fab06
Deploy Preview https://deploy-preview-2770--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D58223358

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D58223358

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D58223358

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D58223358

facebook-github-bot commented 3 months ago

This pull request has been merged in pytorch/FBGEMM@7f77444beed13e70604ef5d3adf01c8648a7cc3f.