probe-lab / zikade

A Go implementation of the libp2p Kademlia DHT specification
Other
12 stars 3 forks source link

Deadlock in QueryBehaviour #57

Closed iand closed 1 year ago

iand commented 1 year ago

This goroutine is holding the QueryBehaviour lock, trying to Notify a Waiter that a EventGetCloserNodesSuccess was received. The are no goroutines selecting on the waiter's channel. I would expect it to be in Coordinator.waitForQuery called from Coordinator.QueryMessage.

goroutine 6818 [select, 33 minutes]:
github.com/plprobelab/zikade/internal/coord.(*Waiter[...]).Notify(0xc0018abbe0, {0x2eda480?, 0xc0021febd0}, {0x2ec3ca0, 0xc0034ab2d0})
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:126 +0x105
github.com/plprobelab/zikade/internal/coord.(*PooledQueryBehaviour).Notify(0xc00014df80, {0x2eda480?, 0xc00216cf00?}, {0x2ec3be0?, 0xc00253a000?})
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/query.go:189 +0x109f
github.com/plprobelab/zikade/internal/coord.(*NodeHandler).send(0xc00079d880, {0x2eda480, 0xc00216cf00}, {0x2ecf4d0?, 0xc00262e460?})
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/network.go:165 +0x33e
github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1.1()
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:81 +0x108
created by github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:75 +0x7a

There are 8 goroutines are waiting on the lock at this point:

goroutine 6990 [sync.Mutex.Lock, 33 minutes]:
sync.runtime_SemacquireMutex(0x11eeaff?, 0x80?, 0xc001f4a390?)
    /home/iand/sdk/go1.20.5/src/runtime/sema.go:77 +0x26
sync.(*Mutex).lockSlow(0xc00014dfd8)
    /home/iand/sdk/go1.20.5/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
    /home/iand/sdk/go1.20.5/src/sync/mutex.go:90
github.com/plprobelab/zikade/internal/coord.(*PooledQueryBehaviour).Notify(0xc00014df80, {0x2eda480?, 0xc001f4a390?}, {0x2ec3d40?, 0xc0015c87d0?})
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/query.go:152 +0x125
github.com/plprobelab/zikade/internal/coord.(*NodeHandler).send(0xc001293c00, {0x2eda480, 0xc001f4a390}, {0x2ecf4f8?, 0xc001293600?})
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/network.go:186 +0x63b
github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1.1()
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:81 +0x108
created by github.com/plprobelab/zikade/internal/coord.(*WorkQueue[...]).Enqueue.func1
    /home/iand/pkg/mod/github.com/plprobelab/zikade@v0.0.0-20231005134401-f9b6f3275245/internal/coord/behaviour.go:75 +0x7a

Somehow we have lost the select that should be reading from the waiter's channel.