sahlberg / libnfs

NFS client library
Other
510 stars 200 forks source link

Avoid rpc_mutex deadlock #462

Closed linuxsmiths closed 4 months ago

linuxsmiths commented 4 months ago

rpc_set_error() holds the rpc_mutex so it cannot be called by any function which already holds the mutex. In this case rpc_write_to_socket() was calling rpc_set_error() resulting in a deadlock. This was detected when running the stress test suite where I continously keep killing the connectiong with "ss -K" while data xfer is on.

To solve this I defined a nolock version for both the rpc and nfs set error functions. Audited the code and fixed one more occurrence. The audit may not be complete so created "error checking" mutex to catch such locking violations.

Didn't use recursive mutex as those encourage bad programming practice and such bugs may silently get hidden.

Testing done: With this fix ran 100+ iterations of 100GB file xfer while killing all connections every 5 seconds.

linuxsmiths commented 4 months ago

@sahlberg, I've done basic audit, so if you get a chance pls audit to see if we are calling rpc_set_error()/nfs_set_error() in contexts already holding the rpc_mutex.

fyi, I'll be away travelling for next couple of days so my responses may be delayed.