linux-rdma / perftest

Infiniband Verbs Performance Tests
Other
536 stars 273 forks source link

Fix race in non-rdmacm ctx_close_connection() #220

Closed rolandd closed 11 months ago

rolandd commented 11 months ago

When rdmacm is not used, ctx_close_connection() does a handshake and then sends "done" over the socket before closing it. This is racy and can lead to a spurious error:

HOST A                                  HOST B

ctx_close_connection()
                                        ctx_close_connection()
  ctx_hand_shake() [ succeeds ]
                                          ctx_hand_shake() [ succeeds ]
                                          write(sockfd, "done")
                                          close(sockfd)

  write(sockfd, "done") <-- fails since HOST B has closed the
                            socket and replies with a TCP RST

Fix this simply by deleting the write(). The ctx_hand_shake() already ensures the two sides are in sync and can proceed to the close().

HassanKhadour commented 11 months ago

Hi,

Thanks for your contribution!. Can you place share a case where this race is likely to happen?

rolandd commented 11 months ago

I don't have a specific case to reproduce this, but we have an automated smoke test that runs ib_send_lat quite frequently and this bug is making the test flaky. I'm not sure if it's test infrastructure or what, but at least for us, one side of the test can run ahead enough that the socket is closed before the other side sends "done". Since there is no matching read() or anything else that consumes the "done" message, I think it makes sense to delete this very old and buggy code.

HassanKhadour commented 11 months ago

Thanks for clarification, merged!