pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.17k stars 474 forks source link

Fix segfault in training unit tests #2929

Closed sryap closed 1 month ago

sryap commented 1 month ago

Summary: Before this diff, there was a segmentation fault error (P1507485454) when running the SSD-TBE unit tests. It was caused by the premature tensor deallocation when the unit test invoked set_cuda. Since set_cuda is non-blocking asynchronous, the unit test must ensure that the input tensors are alive until set_cuda is complete. However, the unit test allocated an input tensor inside a for-loop (in a stack memory). The tensor was deallocated as soon as each for-loop iteration was done -- causing segmentation fault.

This diff fixes the problem by making sure that the input tensor is alive until set_cuda is complete by moving the scope of the tensor outside of the for-loop and adding a proper synchronization.

Differential Revision: D60627636

netlify[bot] commented 1 month ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 7c4b2764b8638eff1e615f583dcdfa282199c270
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66ad6d9e1a208e00082cb34e
Deploy Preview https://deploy-preview-2929--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D60627636

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D60627636

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D60627636

facebook-github-bot commented 1 month ago

This pull request has been merged in pytorch/FBGEMM@9cbf073787eca4ff5e296f2ea74fe6adbcd279eb.