Open rzambre opened 5 years ago
In an earlier version I indeed used MPI_THREAD_MULTIPLE
to have multiple computation threads perform their own communication. Thereby reducing the load on the communication thread. It turned out to be too unstable at that point in time as the various MPI distributions would give random errors and deadlocks. I would be worthwhile to explore this again in a future version once the code has been converted to support the TF 2.0 C API.
I see. Do you remember which MPI libraries you experimented with?
With multiple threads participating in communication, there exits a design space which could explore the use of separate communicators, tags, etc. to expose parallel communication to the MPI library. Is there a communication kernel mini-application or microbenchmark that captures the communication pattern of Tensorflow? That would serve well to explore the performance of the different strategies in the design space of parallel MPI communication.
If a mini-app isn't available, I would be happy to help with writing a mini-app that captures the communication pattern of Tensorflow.
https://github.com/tensorflow/networking/blob/master/tensorflow_networking/mpi/mpi_utils.cc#L56
I see the use of
MPI_THREAD_MULTIPLE
has been commented out. From my understanding of the current design of exchanging data with MPI, we do not requireMPI_THREAD_MULTIPLE
since a dedicated thread is responsible for communication.Are there future plans of having multiple threads perform communication simultaneously (once MPI implementations better support
MPI_THREAD_MULTIPLE
of course)? If so, is it more likely that we have dedicated communication threads or is it possible that the computation threads also perform communication?