Open Thegaram opened 5 years ago
Hi.
Looking at the output, it seems you are right.
Somehow the l.mu
got locked, so we are stuck on deadlock.go:330
.
I do not see how it could happen.
I would look at the dump of the other goroutines, to see if any of them are doing something in deadlock.go (under the l.mu lock).
Or, if you have something reproducible, I'd love to look at it.
I found one bug in the library, hopefully fixed it in https://github.com/sasha-s/go-deadlock/commit/1595213edefa28ca5047b00340c63557f9c051d0 Not sure it would solve your problem, but could you pull and try again?
I just implemented a deadlock detector with easy to prove correctness:
I'm facing the same problem while debugging Mercure:
POTENTIAL DEADLOCK:
Previous place where the lock was grabbed
goroutine 5212824 lock 0xc0001aa780
hub/bolt_transport.go:120 hub.(*BoltTransport).Dispatch { t.Lock() } <<<<<
hub/subscribe.go:201 hub.(*Hub).dispatchSubscriptionUpdate { h.transport.Dispatch(u) }
hub/subscribe.go:180 hub.(*Hub).shutdown { h.dispatchSubscriptionUpdate(s, false) }
hub/subscribe.go:49 hub.(*Hub).SubscribeHandler { return }
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:2012 http.HandlerFunc.ServeHTTP { f(w, r) }
../go/pkg/mod/github.com/gorilla/mux@v1.7.4/mux.go:210 mux.(*Router).ServeHTTP { handler.ServeHTTP(w, req) }
../go/pkg/mod/github.com/gorilla/handlers@v1.4.2/cors.go:54 handlers.(*cors).ServeHTTP { ch.h.ServeHTTP(w, r) }
../go/pkg/mod/github.com/unrolled/secure@v1.0.8/secure.go:177 secure.(*Secure).Handler.func1 { h.ServeHTTP(w, r) }
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:2012 http.HandlerFunc.ServeHTTP { f(w, r) }
../go/pkg/mod/github.com/gorilla/handlers@v1.4.2/logging.go:45 handlers.loggingHandler.ServeHTTP { h.handler.ServeHTTP(logger, req) }
../go/pkg/mod/github.com/gorilla/handlers@v1.4.2/recovery.go:78 handlers.recoveryHandler.ServeHTTP { h.handler.ServeHTTP(w, req) }
../go/pkg/mod/github.com/gorilla/mux@v1.7.4/mux.go:210 mux.(*Router).ServeHTTP { handler.ServeHTTP(w, req) }
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:2807 http.serverHandler.ServeHTTP { handler.ServeHTTP(rw, req) }
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:1895 http.(*conn).serve { serverHandler{c.server}.ServeHTTP(w, w.req) }
Have been trying to lock it again for more than 30s
goroutine 5373639 lock 0xc0001aa780
Here is what goroutine 5212824 doing now
goroutine 5212824 [semacquire]:
sync.runtime_SemacquireMutex(0xc00000ee24, 0x0, 0x1)
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00000ee20)
/usr/local/Cellar/go/1.14.3/libexec/src/sync/mutex.go:138 +0xfc
sync.(*Mutex).Lock(...)
/usr/local/Cellar/go/1.14.3/libexec/src/sync/mutex.go:81
github.com/sasha-s/go-deadlock.(*lockOrder).postUnlock(0xc00000ee20, 0x16feaa0, 0xc0001aa780)
/Users/dunglas/workspace/go/pkg/mod/github.com/sasha-s/go-deadlock@v0.2.1-0.20190427202633-1595213edefa/deadlock.go:330 +0xa6
github.com/sasha-s/go-deadlock.postUnlock(...)
/Users/dunglas/workspace/go/pkg/mod/github.com/sasha-s/go-deadlock@v0.2.1-0.20190427202633-1595213edefa/deadlock.go:167
github.com/sasha-s/go-deadlock.(*RWMutex).Unlock(0xc0001aa780)
/Users/dunglas/workspace/go/pkg/mod/github.com/sasha-s/go-deadlock@v0.2.1-0.20190427202633-1595213edefa/deadlock.go:129 +0x66
github.com/dunglas/mercure/hub.(*BoltTransport).Dispatch(0xc0001aa780, 0xc001522600, 0x0, 0x0)
/Users/dunglas/workspace/mercure/hub/bolt_transport.go:133 +0x251
github.com/dunglas/mercure/hub.(*Hub).dispatchSubscriptionUpdate(0xc000298420, 0xc002c1f5e0, 0xc000f81000)
/Users/dunglas/workspace/mercure/hub/subscribe.go:201 +0x2f2
github.com/dunglas/mercure/hub.(*Hub).shutdown(0xc000298420, 0xc002c1f5e0)
/Users/dunglas/workspace/mercure/hub/subscribe.go:180 +0x4c
github.com/dunglas/mercure/hub.(*Hub).SubscribeHandler(0xc000298420, 0x2bb4068, 0xc00102f260, 0xc000fae900)
/Users/dunglas/workspace/mercure/hub/subscribe.go:49 +0x66e
net/http.HandlerFunc.ServeHTTP(0xc000280ea0, 0x2bb4068, 0xc00102f260, 0xc000fae900)
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:2012 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001dc180, 0x2bb4068, 0xc00102f260, 0xc002d4f500)
/Users/dunglas/workspace/go/pkg/mod/github.com/gorilla/mux@v1.7.4/mux.go:210 +0xe2
github.com/gorilla/handlers.(*cors).ServeHTTP(0xc0001942d0, 0x2bb4068, 0xc00102f260, 0xc002d4f500)
/Users/dunglas/workspace/go/pkg/mod/github.com/gorilla/handlers@v1.4.2/cors.go:54 +0x102a
github.com/unrolled/secure.(*Secure).Handler.func1(0x2bb4068, 0xc00102f260, 0xc002d4f500)
/Users/dunglas/workspace/go/pkg/mod/github.com/unrolled/secure@v1.0.8/secure.go:177 +0xdf
net/http.HandlerFunc.ServeHTTP(0xc00000ff80, 0x2bb4068, 0xc00102f260, 0xc002d4f500)
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:2012 +0x44
github.com/gorilla/handlers.loggingHandler.ServeHTTP(0x18849a0, 0xc000010020, 0x1885560, 0xc00000ff80, 0x17b87b0, 0x18907a0, 0xc002c1f500, 0xc002d4f500)
/Users/dunglas/workspace/go/pkg/mod/github.com/gorilla/handlers@v1.4.2/logging.go:45 +0x243
github.com/gorilla/handlers.recoveryHandler.ServeHTTP(0x18850a0, 0xc000298de0, 0x1883d00, 0xc0002a1b20, 0x1, 0x18907a0, 0xc002c1f500, 0xc002d4f500)
/Users/dunglas/workspace/go/pkg/mod/github.com/gorilla/handlers@v1.4.2/recovery.go:78 +0xce
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001dc0c0, 0x18907a0, 0xc002c1f500, 0xc000a19e00)
/Users/dunglas/workspace/go/pkg/mod/github.com/gorilla/mux@v1.7.4/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc0001c62a0, 0x18907a0, 0xc002c1f500, 0xc000a19e00)
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:2807 +0xa3
net/http.(*conn).serve(0xc0035ebcc0, 0x1892f60, 0xc002980200)
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:1895 +0x86c
created by net/http.(*Server).Serve
/usr/local/Cellar/go/1.14.3/libexec/src/net/http/server.go:2933 +0x35c
I'm using the same version of the library. Thanks for writing such an useful tool btw!
v0.2.1 (1595213edefa)
This blocking occurs in the caching layer, which is heavily loaded, about 100,000 requests per second. The blocking does not work immediately, sometimes it takes up to a week, and sometimes the next day after the restart
POTENTIAL DEADLOCK:
Previous place where the lock was grabbed
goroutine 1595446618 lock 0xc00048e7d0
<project>/pkg/stracing/mutex.go:48 stracing.(*Mutex).Lock ??? <<<<<
...
Have been trying to lock it again for more than 5m0s
goroutine 1595163670 lock 0xc00048e7d0
Here is what goroutine 1595446618 doing now
goroutine 1595446618 [semacquire]:
sync.runtime_SemacquireMutex(0xc0001108a4, 0xd00, 0x1)
runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc0001108a0)
sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(...)
sync/mutex.go:81
github.com/sasha-s/go-deadlock.(*lockOrder).postUnlock(0xc0001108a0, 0xc54900, 0xc00048e7d0)
github.com/sasha-s/go-deadlock@v0.2.1-0.20190427202633-1595213edefa/deadlock.go:330 +0xa6
github.com/sasha-s/go-deadlock.postUnlock(...)
github.com/sasha-s/go-deadlock@v0.2.1-0.20190427202633-1595213edefa/deadlock.go:167
github.com/sasha-s/go-deadlock.(*Mutex).Unlock(0xc00048e7d0)
github.com/sasha-s/go-deadlock@v0.2.1-0.20190427202633-1595213edefa/deadlock.go:97 +0x65
...
But in this case, there is a very small timeout, while in my real code there is 5 minutes.
package main
import (
"time"
"github.com/sasha-s/go-deadlock"
)
func main() {
deadlock.Opts.DeadlockTimeout = 1 * time.Second
var mx deadlock.Mutex
fn := func() {
mx.Lock()
mx.Unlock()
}
for i := 0; i < 100_000; i++ {
go func() {
for {
fn()
}
}()
}
select {}
}
Perhaps the problem is not in this library, but in some other place. But in my real case, mutex protected counting the number in a slice and adding to a slice.
// copy code from https://github.com/jaegertracing/jaeger/blob/748bf213d1e5a03ebc4da63601473be3bbe193f4/examples/hotrod/pkg/tracing/mutex.go with my fixes
type Mutex struct {
...
waiters []string
waitersLock deadlock.Mutex // <--
}
func (sm *Mutex) Lock(ctx context.Context, name string) {
activeSpan := opentracing.SpanFromContext(ctx) // "activeSpan == noopSpan{}" in production
sm.waitersLock.Lock() // <-- "Previous place where the lock was grabbed"
if waiting := len(sm.waiters); waiting > 0 && activeSpan != nil {
activeSpan.LogFields(...) // <-- func (n noopSpan) LogFields(fields ...log.Field) {}
}
sm.waiters = append(sm.waiters, name)
sm.waitersLock.Unlock() // <-- "Have been trying to lock it again for more than 5m0s"
sm.realLock.Lock()
sm.holder = name
sm.waitersLock.Lock()
behindLen := len(sm.waiters) - 1
sm.waitersLock.Unlock()
if activeSpan != nil {
activeSpan.LogFields(...)
}
}
Hey, first of all thanks for the hard work on this great lib!
I'm having trouble interpreting the output below. It suggests
goroutine 77264
holdslock 0xc4202a60e0
for a long time, preventing others (likegoroutine 77325
and many more) from acquiring it.However, the output suggests that
goroutine 77264
actually got stuck during unlock:raft.go:688
is a deferredmu.Unlock()
, anddeadlock.go:330
is actually is a lock acquire statement in this lib.Does this mean that the (potential) deadlock is coming from this lib in this case? What would make
goroutine 77264
get stuck on that internal lock? (I reproduced the same output with1m30s
lock timeout.)