Closed guillaumemichel closed 1 year ago
Some tests are failing randomly, and it is a hell to debug
Note the tests timeout.
Some tests seem to enter randomly an infinite loop, and some seem to always enter an infinite loop, resulting in the tests timing out.
This isn't the case without the new go routine in peerFound
introduced in this PR. I don't understand the root cause, and it is hard to debug because of the nature of the tests.
I'm not up on the specifics here, but want to understand the testing situation. Do we have regression tests in place for the fix? Is this change causing other tests to fail? I know we want to get this out, but I'd like to understand what it takes to do it right.
@Jorropo and I debugged the tests, but were not able to get them to work yet. It seems unrelated to the new code, but it wouldn't be wise to push this change. We also uncovered new bugs. More debugging is required to fix the broken tests.
If Kubo has to be released shortly, I would recommend reverting this change and using go-libp2p-kad-dht v0.23.0, and the fix will (hopefully) make it to the next release.
Thanks for the update and work @guillaumemichel and @Jorropo .
Per 2023-06-14 verbal with @Jorropo , we agreed that we won't block the Kubo release on this. It can come in a followup Kubo release.
@Jorropo : to be 100% clear for anyone watching, I assume we'll revert the change and do a patch release (rather than recommend consumers use the previous version)? (The functionality would be restored and include a fix as a followup patch release.)
yes
addresses https://github.com/libp2p/go-libp2p-kad-dht/issues/849
depends on https://github.com/libp2p/go-libp2p-kbucket/pull/120
linked: https://github.com/ipfs/kubo/issues/9814