Closed kailaix closed 4 years ago
It's not that simple. The problem of blocking comms is that this implies a synchronization (unless buffering is used). So even if the comm is fast there may be substantial delays. But regardless it's much simpler so it's a good starting point.
add MPI adjoint features.
For scientific computing, a good algorithm should have small communication overhead. Therefore, blocking send and receive should not work too worse than nonblocking ones. The MPI adjoint feature focuses only on blocking send and receive.
The idea is to implement 6 functions that are implemented with custom operators.
The first four functions should implement gradient backprop