jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
45 stars 3 forks source link

NeurIPS '18 | Mesh-TensorFlow:Deep Learning for Supercomputers #169

Closed jasperzhong closed 3 years ago

jasperzhong commented 3 years ago

https://papers.nips.cc/paper/2018/file/3a37abdeefe1dab1b30f7c5c7e581b93-Paper.pdf

jasperzhong commented 3 years ago

确实. data parallelism不过是split "batch"维度 (没有"batch"维度的参数做replicate). 广义上可以split任何维度,这就是model parallelism (没有该维度的做replicate).

这也是Oneflow SBP解决的问题. 我觉得SBP看上去更优雅一些.