moby / libnetwork

networking for containers
Apache License 2.0
2.16k stars 881 forks source link

Fix deadlock between getSvcRecords and processEndpointDelete #2679

Closed corhere closed 1 year ago

corhere commented 1 year ago

We had some hosts with quite a bit of cycling containers that ocassionally causes docker daemons to lock up. Most prominently docker run commands do not respond and nothing happens anymore.

Looking at the stack trace the following is at least likely sometimes a cause to that: Two goroutines g0 and g1 can race against each other:

3./4. are deadlocked against each other since the other goroutine holds the lock they need.

References https://github.com/moby/libnetwork/blob/b5dc37037049d9b9ef68a3c4611e5eb1b35dd2af/network.go

(cherry picked from commit moby/moby@7c97896747726554165480d102d9e46c54334cba)

neersighted commented 1 year ago

Didn't we already backport this in https://github.com/moby/libnetwork/commit/51413ef1371ffe29ffd341c80574a3bc1c585d90 and https://github.com/moby/libnetwork/pull/2670?

corhere commented 1 year ago

You just fell into the same trap as @thaJeztah. #2670 backported the deadlock fix for processEndpointCreate; this backports the fix for processEndpointDelete

neersighted commented 1 year ago

Aha, just saw https://github.com/moby/moby/issues/46661#issuecomment-1768735326

cpuguy83 commented 1 year ago

I'm not sure I understand the description here. It says this is a 20.10 backport but this is merging against master?

corhere commented 1 year ago

@cpuguy83 that's right, moby/libnetwork@master. Ain't 20.10 libnetwork backports fun?

cpuguy83 commented 1 year ago

lol, this is libnetwork not moby/moby. Wow I'm quick.