probe-lab / go-kademlia

Generic Go Kademlia implementation
Other
17 stars 4 forks source link

Fix flaky libp2p tests #105

Closed guillaumemichel closed 10 months ago

guillaumemichel commented 10 months ago

Addresses https://github.com/plprobelab/go-kademlia/issues/103

iand commented 10 months ago

Hit the race that is fixed in https://github.com/plprobelab/go-kademlia/pull/102

guillaumemichel commented 10 months ago

I am confident that TestConnections is fixed. The issue may have been introduced with go-libp2p 0.28, because loopback addresses may not be advertised anymore.

I am less confident with TestReqTimeout, because I am unable to reproduce the failure on my machine. I gave it a shot, and for now, it isn't failing on the CI (even though it only failed rarely). I will keep running the CI many times, and if the test isn't failing I would suggest that we merge it. If the same test is failing again in the future, we can open a new PR.

iand commented 10 months ago

You could run the following to reproduce the flake:

go test -c . && while ./libp2p.test -test.shuffle=on -test.count=100 -test.failfast; do date; done

guillaumemichel commented 10 months ago

TestReqTimeout isn't fixed. See https://github.com/plprobelab/go-kademlia/actions/runs/5892795414/job/16000386471#step:14:1

guillaumemichel commented 10 months ago

You could run the following to reproduce the flake:

go test -c . && while ./libp2p.test -test.shuffle=on -test.count=100 -test.failfast; do date; done

Unfortunately it doesn't help, the test keeps passing on my machine.

I have even run the following script for quite some time without getting the test to fail.

while true; do
    output=$(go1.20 test -run ^TestReqTimeout$)

    # Count the number of lines in the output
    line_count=$(echo "$output" | wc -l)

    # If the line count is greater than 2, the test failed. break out of the loop
    if (( line_count > 2 )); then
        echo "$output"
        break
    fi
done
guillaumemichel commented 10 months ago

TestReqTimeout alone took more than 1s to run on Windows Run Tests (32-bit). The issue probably comes from the timing, will try to reproduce the failure.

ok      github.com/plprobelab/go-kademlia/libp2p    1.136s
guillaumemichel commented 10 months ago

The tests have run successfully 15 times in addition to the displayed tests.

IMO this is good to merge