Closed duduyi2013 closed 1 month ago
Name | Link |
---|---|
Latest commit | 639a2f71dc265a36ef7c83c2a7d3ec20a7a80671 |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66b0128ebcb31a0008ce60a6 |
Deploy Preview | https://deploy-preview-2930--pytorch-fbgemm-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This pull request was exported from Phabricator. Differential Revision: D60635718
This pull request was exported from Phabricator. Differential Revision: D60635718
This pull request was exported from Phabricator. Differential Revision: D60635718
This pull request was exported from Phabricator. Differential Revision: D60635718
This pull request was exported from Phabricator. Differential Revision: D60635718
This pull request has been merged in pytorch/FBGEMM@6607072fe9cf1f9f48131a04825fb87eca86ffe6.
Summary: the reason we need this is we constantly see the port conflict error in rocksdb initialization. Before this diff we call getFreePort to ge an available port. For each ssd tbe we will create 32 rocksdb shards, so in total there are 256 ports needed per host. This works fine with 4 hosts until we are running 16 hosts training job as we need make sure all 16 hosts don't get into the corner cases where multiple db shard get assigned the same free port.
Differential Revision: D60635718