tensorflow / mesh

Mesh TensorFlow: Model Parallelism Made Easier
Apache License 2.0
1.58k stars 254 forks source link

GPipe vs mesh? #25

Closed den-run-ai closed 4 years ago

den-run-ai commented 5 years ago

Any comments about GPipe which was supposed to be open sourced by Google soon?

Looks like both GPipe and Mesh can do model/data parallelism.

matthewygf commented 5 years ago

@denfromufa

it is open sourced here: https://github.com/tensorflow/lingvo/blob/master/lingvo/core/gpipe.py

i might not be correct, but seems to me with Mesh, it is solved by constructing an ILP, i.e. finding an optimal set of layout with a number of constrains.

where gpipe, you defined your op that get separated manually.

den-run-ai commented 5 years ago

@matthewygf let's hope that Google releases the image classification sample soon and also any basic documentation on how to use GPipe. Right now it seems to be deeply integrated with sequence models only.

zaccharieramzi commented 3 years ago

From what I understand, GPipe is actually pipelining, that is model sequentialism + data parallelism. Basically, you place each layer on a GPU sequentially, and one GPU #1 is done with batch #1 you feed batch #2. This is illustrated in the Figure 2.c of their paper.

Mesh, however, is true model parallelism in the sense that you really define a distributed operation say the convolution, and distribute it on different GPUs. So you won't be suffering from the bubble of GPipe, that is, except when communicating, all your GPUs will be in use.