Open JonathanLennox opened 1 year ago
I have reproduced the same exact issue in a test suite with Gstreamers usrsctplib version, in my case the sctp-timer thread ticks with a SCTP_TIMER_TYPE_SHUTDOWN
event, decrements a reference on the socket and eventually calls sofree() which triggers this same assert:
KASSERT(so->so_count == 0, ("sodealloc(): so_count %d", so->so_count));
@tuexen is there anything else we can help you with to figure out this issue? I can reproduce it very often with a test that stresses the setup and shutdown of the sockets, so it won't be a problem to dig out more information.
@tuexen : I have a hypothesis here.
The userspace code in sctp_close
locks SCTP_INP_WLOCK
when it sets SCTP_PCB_FLAGS_SOCKET_GONE
in sctp_flags
; however, the userspace code to set upcall_socket
in sctp_timeout_handler
checks that bit in sctp_flags
without, as far as I can tell, acquiring that lock. Thus, I suspect, under load, the sctp_timeout_handler
thread can get suspended immediately after the check of SCTP_PCB_FLAGS_SOCKET_GONE
, and a usrsctp_close
call can sneak in.
However, I don't understand the lock hierarchy here. Would it be safe to lock SCTP_INP in sctp_timeout_handler
, or could that cause a lock inversion?
This is the same environment as https://github.com/sctplab/usrsctp/issues/673, and I suspect the same root cause, but a different crash - perhaps this one will give more information.
I am getting an assert crash in usrsctp with the assertion
which certainly looks like the socket object has been freed and re-used by another allocation. In my logs I can see that usrsctp_close was called just before the crash.
The crash is in the sctp_timeout_handler, and the stack is
The state inside sctp_timeout_handler is
I suspect it's not very useful because the memory has probably been re-used, but in case there's anything meaningful still lingering in later fields, the socket object is
In case it's useful, here are the dereferences of the local pointers referenced by sctp_timeout_handler: