nspcc-dev / neofs-node

NeoFS is a decentralized distributed object storage integrated with the Neo blockchain
https://fs.neo.org
GNU General Public License v3.0
31 stars 38 forks source link

morph: Fix subscription deadlock #2720

Closed carpawell closed 7 months ago

carpawell commented 7 months ago

Scenario:

  1. at least one subscription has been performed
  2. another subscription is being done
  3. a notification from one of the 0. point's subs is received

If 2. happens b/w 0. and 1. a deadlock appears since the notification routing process is locked on the subscription lock while the subscription lock cannot be unlocked since the subscription RPC cannot be done before the just-arrived notification is handled (read from the neo-go subscription channel). Relates #2559.

codecov[bot] commented 7 months ago

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (46f2a82) 28.81% compared to head (d8beaf7) 28.80%. Report is 1 commits behind head on master.

Files Patch % Lines
pkg/morph/client/notifications.go 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2720 +/- ## ========================================== - Coverage 28.81% 28.80% -0.01% ========================================== Files 415 415 Lines 32397 32397 ========================================== - Hits 9335 9333 -2 - Misses 22228 22229 +1 - Partials 834 835 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

carpawell commented 7 months ago

to me its ok that subscribing ops acquire subs lock. And they must not cause any deadlocks. If a deadlock happens - it's almost definitely the routing/handling fault and should be resolve there

I would say https://github.com/nspcc-dev/neofs-node/commit/5655887bca9c1a5f275461ff5804b50a17ee7c1a was not necessary. We can add the third lock but that is extremely hard and not necessary for the client, it would do too many things then.

I do not know what to say here, is it the 5th or the 10th time I am fixing a deadlock caused by the impossibility of doing any RPC calls if nothing reads subs channel? neo-go is changing, neofs is changing but still, we face an intuitive regular code that deadlocks.

cthulhu-rider commented 7 months ago

5th or the 10th time I am fixing a deadlock caused by the impossibility of doing any RPC calls if nothing reads subs channel

let's get rid of this minefield once and for all. Half actions don't work

temporary unlocked subscription is not a solution to me. Nothing prevents the listener from discarding events that are not yet in demand

carpawell commented 7 months ago

let's get rid of this minefield once and for all

What do you suggest? We need to hold a few connections and be able to switch b/w them so a lock is there. But neo-go requires to read subs. But making a new sub is an RPC that is impossible without reading the subs.

temporary unlocked subscription is not a solution to me

If you prohibit subscription change, you may experience a connection switch (and a subscription lock) while trying to make a new subscription (another subscription lock).

roman-khimov commented 7 months ago

You can make routeNotifications not require s.subs lock for regular (non-reconnecting) operation. Likely this will solve the problem. Locking here looks correct.

carpawell commented 7 months ago

You can make routeNotifications not require s.subs lock for regular (non-reconnecting) operation. Likely this will solve the problem.

Legit. But I do not like it cause that will work only based on some info we know: no routines change the channels if no RPC failures happen (or they even happen but their handling is locked cause we are reading some internal channels). Anyway, done. Small diff, fixes the problem.