issues
search
joapolarbear
/
dl_notes
1
stars
1
forks
source link
Optimizing Large-scale Deep Learning by Minimizing Resource Contention for Data Processing
#32
Open
joapolarbear
opened
3 years ago
joapolarbear
commented
3 years ago
Poster
A good explanation of Horovod Workflow
Solve the problem that horovod back ground threads needs to sync to each other to check which tensors are
ready
, which may be time consuming
Solution
Use global sleep time instead of local cycle time to avoid oversleep
Nonblocking Cache Synchronization
Static CPU Resource Partitioning
Graph Topology Exploitation, to ensure the tensor order.
Poster
Solution