Open prestonvanloon opened 2 years ago
I'm interested in helping on this task. I just followed the Contributer guide and got a local environment to finish bazelisk test //beacon-chain/node:go_default_test
. So I think I have the baseline ready.
Reading the test in TestVerifyConnectivity, some initial thoughts are:
The next steps I can attempt are to gather data to narrow down what is contributing to the failures. For example, choosing a destination that expressly accepts test requests, increasing delay between test run iterations, increasing dial-timeout, or try UDP port 53 (dns).
Hey @patterns , thanks for taking a look at this ! Just assigned you the issue. It seems you already have made good progress on this :) , looking forward to the improvements to these tests.
Very appreciated, thanks! I think I have the TestVerifyConnectivity
changes for a unit test. I borrowed most of the the code from the dial_test.go in the standard lib's net package and realized when looking there that they distinguish between available-external-network (and "local") for the test environment. That made me wonder whether TestVerifyConnectivity
is meant to exercise the net.Dial()
because it shouldn't be necessary to test that net.Dial()
works. It made me think that the purpose of the test is really to verify/prove that a log entry is generated under the right circumstances (unreachable addresses). This seems reasonable because when I looked for where it is called, it is to do quick "health check" on
the Host-Address config param (line#266 in service.go). And I think the param option is the p2p-host-IP:
--p2p-host-ip value The IP address advertised by libp2p. This may be used to advertise an external IP.
I need to confirm whether is correct. At the same time, I'll try to understand the second failing test (Discovery-Attempts one).
Hey @patterns ,
That is correct, the TestVerifyConnectivity
is used to verify that dials to external IPs work as expected.
🐞 Bug Report
Description
p2p tests are extremely flaky. These tests typically pass around one in ten runs.
I ran this test with
--test_strategy=exclusive --runs_per_test=500
overnight and the result is as follows:Has this worked before in a previous version?
This has been an issue for some time. It passes in CI as this package is rarely changed and a passing test is cached between CI requests.
🔬 Minimal Reproduction
bazel or go test
🔥 Error
🌍 Your Environment
Operating System:
What version of Prysm are you running? (Which release)
Latest develop commit
Anything else relevant (validator index / public key)?