Closed jasperzhong closed 3 years ago
https://papers.nips.cc/paper/2018/file/3a37abdeefe1dab1b30f7c5c7e581b93-Paper.pdf
确实. data parallelism不过是split "batch"维度 (没有"batch"维度的参数做replicate). 广义上可以split任何维度,这就是model parallelism (没有该维度的做replicate).
这也是Oneflow SBP解决的问题. 我觉得SBP看上去更优雅一些.
https://papers.nips.cc/paper/2018/file/3a37abdeefe1dab1b30f7c5c7e581b93-Paper.pdf