SCTP_INP_WLOCK held while delivering SCTP_SEND_FAILED_EVENT notification during SCTP restart

sctplab / usrsctp

A portable SCTP userland stack

BSD 3-Clause "New" or "Revised" License

677 stars 280 forks source link

SCTP_INP_WLOCK held while delivering SCTP_SEND_FAILED_EVENT notification during SCTP restart #537

Open taylor-b opened 4 years ago

taylor-b commented 4 years ago

When handling "case A in Section 5.2.4 Table 2: XXMM (peer restarted)", SCTP_INP_WLOCK is acquired before calling sctp_report_all_outbound, which will deliver SCTP_SEND_FAILED_EVENT notifications.

This means that while processing the notification, it's not safe for the application to use any API that may attempt to acquire the INP lock. In our case we were trying to call usrsctp_getladdrs.

Would you consider this a bug, or is the general rule "you shouldn't call any usrsctp APIs from within a callback"?

tuexen commented 4 years ago

I would consider it a bug. However, that should not occur when using the upcall API. These kind of issues where the motivation to implement the upcall API.

taylor-b commented 4 years ago

Another problem: there's a lock order inversion, because sctp_invoke_recv_callback reacquires the TCB lock:

        SCTP_TCB_UNLOCK(stcb);
        if (inp_read_lock_held == 0) {
            SCTP_INP_READ_UNLOCK(inp);
        }
        inp->recv_callback(so, addr, buffer, length, rcv, flags, inp->ulp_info);
        SCTP_TCB_LOCK(stcb);

... while holding SCTP_TCB_SEND_LOCK. Normally they're acquired in the reverse order, so this could result in a deadlock.

Do you think you'll fix this, or should we just switch to the upcall API?

tuexen commented 4 years ago

I can see if it can easily be fixed. I would suggest to move to the upcall API, since that conceptually deals with this kind of problems. However, it does not have the same amount of test coverage...

tuexen commented 4 years ago

BTW: Why are you enabling the send failed notification? Are you actually doing anything with the messages given back?

taylor-b commented 4 years ago

We were only using it for logging, so I'm just temporarily disabling it for now.