parallelizing L2 cache lookup

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Other

1.17k stars 474 forks source link

Closed duduyi2013 closed 2 weeks ago

duduyi2013 commented 2 weeks ago

Summary: Change sets

instead of allocate an intermediate tensor to collect the L2 cache miss info, we will do all the embeddings copy inside the originally provided tensor and mark related indices to -1
paralizing the cache lookup logic using multiple cachelib pools which helps reduce the LRU contention
fix cachelib->UVA tensor data copy bug(wrong offset)

Differential Revision: D61417947

netlify[bot] commented 2 weeks ago

Name	Link
Latest commit	4237e3a7f6357301f9253739df6fd44b6b48a8d1
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66cfae3fdb2e2a00086b17f1

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D61417947

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D61417947

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D61417947

facebook-github-bot commented 2 weeks ago

This pull request has been merged in pytorch/FBGEMM@c41d67646480fcf648bb2695f83d06221fa9933a.