Open assij opened 3 years ago
have the same question
The same question! What is the status of that feature? In the paper https://arxiv.org/pdf/1811.02084.pdf you mention "Implementation of SPMD programming on CPU/GPU clusters" (Future Work). Is the project dead? @adarob @dustinvtran ?
@nshazeer to comment.
While this project is not dead, I would not expect to see significant new features added. There are some TF2- and JAX-based libraries inspired by Mesh TensorFlow under development that will have this functionality. They will also be more "production-ready", i.e., better supported and documented :)
https://github.com/tensorflow/lingvo may also support this now.
@adarob do you have a ballpark release date for these libraries?
Early 2021 for jax
Hi, Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to train on the entire (2 nodes *8 gpus )=16 GPUs. How do I configure mesh tensorflow to train in a multi node setup?
Thanks