Closed abouteiller closed 6 years ago
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
This is not a harmless error message. While this "retry" activity is ongoing progress stalls.
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
too many retries sending message to 0x0003:0x00d00412, giving up
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Progress stalls completely after such reports arrive. Needs to be fixed.
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Principal issue resolved in 04f61d22 and 79aca0bb : credits were not returned from communication with failed procs.
Issue remains for UDCM failures (and presumably for rdmacm as well).
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
UDCM issue resolved in 2fb5440a RDMACM does not appear to have a similar code path, the issue could not be exhibited on our EDR hardware.
Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).