pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.17k stars 474 forks source link

parallelizing L2 cache lookup #3032

Closed duduyi2013 closed 2 weeks ago

duduyi2013 commented 2 weeks ago

Summary: Change sets

  1. instead of allocate an intermediate tensor to collect the L2 cache miss info, we will do all the embeddings copy inside the originally provided tensor and mark related indices to -1
  2. paralizing the cache lookup logic using multiple cachelib pools which helps reduce the LRU contention
  3. fix cachelib->UVA tensor data copy bug(wrong offset)

Differential Revision: D61417947

netlify[bot] commented 2 weeks ago

Deploy Preview for pytorch-fbgemm-docs failed.

Name Link
Latest commit 4237e3a7f6357301f9253739df6fd44b6b48a8d1
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66cfae3fdb2e2a00086b17f1
facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D61417947

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D61417947

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D61417947

facebook-github-bot commented 2 weeks ago

This pull request has been merged in pytorch/FBGEMM@c41d67646480fcf648bb2695f83d06221fa9933a.