Closed abouteiller closed 6 years ago
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
c00.cauchy:29481] [[9733,1],1] ompi_rbcast_bml_send_complete_cb: status -12
[c00.cauchy:29481] PML:OB1: the error handler was invoked by the tcp BTL for proc [[9733,1],0] with info Socket closed
[c00.cauchy:29481] [[9733,1],1] ompi: Process [[9733,1],0] failed (state = -57).
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Possibly a bug inherited from upstream. George is working on a fix for Open MPI.
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Problem solved in patches 0237a707 61c5954f
Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
The modifications to render the TCP BTL resilient seem to be excessive and cause the BTL to trigger failure events for operations with normal processes in some instances.