Open ravicious opened 1 year ago
This just blocked a v13 release: https://github.com/gravitational/teleport/actions/runs/5903184778/job/16012575863
integration_test.go:4018:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4018
/__w/teleport/teleport/integration/integration_test.go:119
Error: Received unexpected error:
failed connecting to host 127.0.0.1:44391: failed to dial target host
unable to fetch access point, this proxy clusterPeer(TunnelConnection(name=19978ea4-1790-45e8-bb71-af0b41581184-cluster-remote, type=, cluster=cluster-remote, proxy=19978ea4-1790-45e8-bb71-af0b41581184)) has not been discovered yet, try again later
Test: TestIntegrations/Discovery
Another one on master: https://github.com/gravitational/teleport/actions/runs/6027048272/job/16351354131
integration_test.go:4044:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4044
/__w/teleport/teleport/integration/integration_test.go:125
Error: Received unexpected error:
failed connecting to host 127.0.0.1:41815: failed to receive cluster details response
failed to dial target host
unable to fetch access point, this proxy clusterPeer(TunnelConnection(name=2f6a37a0-959d-474b-9a9b-325beb70a9fc-cluster-remote, type=, cluster=cluster-remote, proxy=2f6a37a0-959d-474b-9a9b-325beb70a9fc)) has not been discovered yet, try again later
Test: TestIntegrations/Discovery
different failure on master:
https://github.com/gravitational/teleport/actions/runs/6227775190/job/16903131172?pr=31916
integration_test.go:4048:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4052
/opt/go/src/runtime/asm_amd64.s:1650
Error: Expected value not to be nil.
integration_test.go:4048:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4048
/__w/teleport/teleport/integration/integration_test.go:123
Error: Condition never satisfied
Test: TestIntegrations/Discovery
I got the same failure as Gavin on v14 https://github.com/gravitational/teleport/actions/runs/6422952356/job/17440476992
https://github.com/gravitational/teleport/actions/runs/6565574843/job/17834398083
integration_test.go:4187:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6166
/opt/go/src/runtime/asm_amd64.s:1650
Error: Expected value not to be nil.
integration_test.go:4187:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6162
/__w/teleport/teleport/integration/integration_test.go:4187
/__w/teleport/teleport/integration/integration_test.go:123
Error: Condition never satisfied
Test: TestIntegrations/Discovery
https://github.com/gravitational/teleport/actions/runs/6579308041/job/17874902654
integration_test.go:4187:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6166
/opt/go/src/runtime/asm_amd64.s:1650
Error: Expected value not to be nil.
integration_test.go:4187:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6162
/__w/teleport/teleport/integration/integration_test.go:4187
/__w/teleport/teleport/integration/integration_test.go:123
Error: Condition never satisfied
Test: TestIntegrations/Discovery
integration_test.go:4267:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6248
/opt/go/src/runtime/asm_amd64.s:1650
Error: Expected value not to be nil.
integration_test.go:4267:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6244
/__w/teleport/teleport/integration/integration_test.go:4267
/__w/teleport/teleport/integration/integration_test.go:126
Error: Condition never satisfied
Test: TestIntegrations/Discovery
I ran into a new kind of error on branch/v14:
https://github.com/gravitational/teleport/actions/runs/6960808427/job/18941067310#step:6:2122
integration_test.go:4267:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6254
/opt/go/src/runtime/asm_amd64.s:1650
Error: Received unexpected error:
connection error: desc = "transport: Error while dialing: failed to dial: cluster cluster-remote is offline"
integration_test.go:4267:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6244
/__w/teleport/teleport/integration/integration_test.go:4267
/__w/teleport/teleport/integration/integration_test.go:126
Error: Condition never satisfied
Test: TestIntegrations/Discovery
Failure from a PR rebased with the tip of master
=== FAIL: integration TestIntegrations/Discovery (20.69s)
integration_test.go:4144:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4144
/__w/teleport/teleport/integration/integration_test.go:124
Error: Received unexpected error:
failed connecting to host 127.0.0.1:45901: failed to receive cluster details response
failed to dial target host
direct dialing to nodes not found in inventory is not supported
Test: TestIntegrations/Discovery
--- FAIL: TestIntegrations/Discovery (20.69s)
=== FAIL: integration TestIntegrations (719.38s)
https://github.com/gravitational/teleport/actions/runs/7020003123/job/19098756968?pr=34457
Error on a branch based on branch/v14, it includes the backport of the recent fix (#34996).
https://github.com/gravitational/teleport/actions/runs/7031231504/job/19132398108#step:6:3561
integration_test.go:4233:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4233
/__w/teleport/teleport/integration/integration_test.go:126
Error: Received unexpected error:
failed connecting to host 127.0.0.1:34309: failed to receive cluster details response
failed to dial target host
unable to fetch node watcher, this proxy clusterPeer(TunnelConnection(name=00000000-0000-0000-0000-000000000000-cluster-remote, type=, cluster=cluster-remote, proxy=00000000-0000-0000-0000-000000000000)) has not been discovered yet, try again later
Test: TestIntegrations/Discovery
Another in the merge queue to master https://github.com/gravitational/teleport/actions/runs/7090182023/job/19296583245
=== RUN TestIntegrations/Discovery
integration_test.go:4172:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6155
/opt/go/src/runtime/asm_amd64.s:1650
Error: Expected value not to be nil.
integration_test.go:4172:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:6151
/__w/teleport/teleport/integration/integration_test.go:4172
/__w/teleport/teleport/integration/integration_test.go:126
Error: Condition never satisfied
Test: TestIntegrations/Discovery
--- FAIL: TestIntegrations/Discovery (50.00s)
https://github.com/gravitational/teleport/actions/runs/7115810275/job/19372821232#step:6:1122
integration_test.go:4146:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4146
/__w/teleport/teleport/integration/integration_test.go:126
Error: Received unexpected error:
failed connecting to host 127.0.0.1:34889: failed to receive cluster details response
failed to dial target host
direct dialing to nodes not found in inventory is not supported
Test: TestIntegrations/Discovery
v13
https://github.com/gravitational/teleport/actions/runs/7127815036/job/19408522862#step:6:1909
integration_test.go:4121:
Error Trace: /__w/teleport/teleport/integration/integration_test.go:4121
/__w/teleport/teleport/integration/integration_test.go:123
Error: Received unexpected error:
failed connecting to host 127.0.0.1:36773: failed to receive cluster details response
failed to dial target host
unable to fetch node watcher, this proxy clusterPeer(TunnelConnection(name=00000000-0000-0000-0000-000000000000-cluster-remote, type=, cluster=cluster-remote, proxy=00000000-0000-0000-0000-000000000000)) has not been discovered yet, try again later
Test: TestIntegrations/Discovery
This is an issue again. Fresh hits on master:
=== Failed
=== FAIL: integration TestIntegrations/Discovery (47.02s)
trustedclusters.go:148:
Error Trace: /__w/teleport/teleport/integration/helpers/trustedclusters.go:159
/opt/go/src/runtime/asm_amd64.s:1650
Error: Received unexpected error:
unable to fetch client, this proxy clusterPeer(TunnelConnection(name=63cc50e2-09e3-4d28-81b2-31e0542e0311-cluster-remote, type=, cluster=cluster-remote, proxy=63cc50e2-09e3-4d28-81b2-31e0542e0311)) has not been discovered yet, try again later
Messages: cluster not yet available
trustedclusters.go:148:
Error Trace: /__w/teleport/teleport/integration/helpers/trustedclusters.go:148
/__w/teleport/teleport/integration/integration_test.go:4172
/__w/teleport/teleport/integration/integration_test.go:126
Error: Condition never satisfied
Test: TestIntegrations/Discovery
Messages: Active tunnel connections did not reach 1 in the expected time frame 30s
--- FAIL: TestIntegrations/Discovery (47.02s)
=== FAIL: integration TestIntegrations (763.62s)
Another on master, which is preventing us from getting an important build fix out.
https://github.com/gravitational/teleport/actions/runs/7718393459/job/21039553375?pr=37568
This failed repeatedly for the v15.0.2 release, delaying it by a couple of hours
v14 release as well: https://github.com/gravitational/teleport/actions/runs/7937403150/job/21674512929
I'll prioritize fixing this next week.
grazie mille @espadolini 🙏
Extending the timeout in WaitForActiveTunnelConnections
and logging some timestamps revealed that it can take longer than 30 seconds for tunnels to appear, depending on how unlucky we get with how many attempts the remote cluster agentpool takes before the expected reverse tunnel server is available and how unlucky we get while it's available.
This is intended behavior, since we want reconnection attempts to gradually back off after a small initial burst - however, it's very undesirable in tests.
With a very hacky change (3a51a830a0dc0cd3e6ad816066f8048b33be6876) I haven't seen a WaitForActiveTunnelConnections
take more than 0.5 seconds(!), but the cpu usage during the test can spike quite a bit (I've seen agent lease IDs in the 50+).
I'm not entirely sure of what's the least worse way to make the agent pools more "aggressive" in our tests - and I'm not even sure if we truly want to, in integration tests - perhaps we should change Teleport itself to reconnect slightly more often even in actual builds, at least for the remote cluster agent pool?
Failure
Link(s) to logs
Relevant snippet