libp2p / go-libp2p-kad-dht

A Kademlia DHT implementation on go-libp2p
https://github.com/libp2p/specs/tree/master/kad-dht
MIT License
517 stars 221 forks source link

Fixing leaking go routines #850

Closed guillaumemichel closed 1 year ago

guillaumemichel commented 1 year ago

addresses https://github.com/libp2p/go-libp2p-kad-dht/issues/849

depends on https://github.com/libp2p/go-libp2p-kbucket/pull/120

linked: https://github.com/ipfs/kubo/issues/9814

guillaumemichel commented 1 year ago

Some tests are failing randomly, and it is a hell to debug

Jorropo commented 1 year ago

Note the tests timeout.

guillaumemichel commented 1 year ago

Some tests seem to enter randomly an infinite loop, and some seem to always enter an infinite loop, resulting in the tests timing out.

This isn't the case without the new go routine in peerFound introduced in this PR. I don't understand the root cause, and it is hard to debug because of the nature of the tests.

https://github.com/libp2p/go-libp2p-kad-dht/blob/43e8891c60e2f616287bff93a3ff1fa88fb8b87e/dht.go#L669-L713

BigLep commented 1 year ago

I'm not up on the specifics here, but want to understand the testing situation. Do we have regression tests in place for the fix? Is this change causing other tests to fail? I know we want to get this out, but I'd like to understand what it takes to do it right.

guillaumemichel commented 1 year ago

@Jorropo and I debugged the tests, but were not able to get them to work yet. It seems unrelated to the new code, but it wouldn't be wise to push this change. We also uncovered new bugs. More debugging is required to fix the broken tests.

If Kubo has to be released shortly, I would recommend reverting this change and using go-libp2p-kad-dht v0.23.0, and the fix will (hopefully) make it to the next release.

BigLep commented 1 year ago

Thanks for the update and work @guillaumemichel and @Jorropo .

Per 2023-06-14 verbal with @Jorropo , we agreed that we won't block the Kubo release on this. It can come in a followup Kubo release.

BigLep commented 1 year ago

@Jorropo : to be 100% clear for anyone watching, I assume we'll revert the change and do a patch release (rather than recommend consumers use the previous version)? (The functionality would be restored and include a fix as a followup patch release.)

Jorropo commented 1 year ago

yes