Closed jasperzhong closed 3 years ago
Great work! PS design doc says embedding parameters are replicated across servers and synchronization is needed for consistency. I have some concerns about the synchronization overhead since embedding layers are usually huge. For example, Facebook mentioned that their embedding tables of production may be terabytes in size (paper link).
Yes, there is the synchronization overhead of huge embedding layers. So, we can adjust the staleness in the gradient update to improve efficiency. For example, the ps can drop the gradient version is stale.
Great work! PS design doc says embedding parameters are replicated across servers and synchronization is needed for consistency. I have some concerns about the synchronization overhead since embedding layers are usually huge. For example, Facebook mentioned that their embedding tables of production may be terabytes in size (paper link).