redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.16k stars 559 forks source link

CI Failure (key symptom) in `OMBValidationTest.test_max_partitions` #18791

Closed vbotbuildovich closed 1 week ago

vbotbuildovich commented 1 month ago

https://buildkite.com/redpanda/vtools/builds/14407

Module: rptest.redpanda_cloud_tests.omb_validation_test
Class: OMBValidationTest
Method: test_max_partitions
test_id:    OMBValidationTest.test_max_partitions
status:     FAIL
run time:   1015.422 seconds

CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cpfvnpnqlv0o9rdteq20-agent', 'kubectl', 'get', 'pods', '-n', 'redpanda', '-o', 'json'], '', '\x1b[31mERROR: \x1b[0mfailed connecting to host cpfvnpnqlv0o9rdteq20-agent:0: failed to receive cluster details response\n\tfailed to dial target host\n\tTeleport proxy failed to connect to "node" agent "@local-node" over reverse tunnel:\n\n  no tunnel connection found: no node reverse tunnel for 4f118dc8-ba33-470b-a153-4919dce00d2a.proxy.tp.redpanda.com found\n\nThis usually means that the agent is offline or has disconnected. Check the\nagent logs and, if the issue persists, try restarting it or re-registering it\nwith the cluster.\n\n')
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 105, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/omb_validation_test.py", line 617, in test_max_partitions
    self.redpanda.assert_cluster_is_reusable()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2126, in assert_cluster_is_reusable
    uh_reason = self.cluster_unhealthy_reason()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2208, in cluster_unhealthy_reason
    ret = self.kubectl.exec('rpk cluster health')
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 275, in exec
    return self._ssh_cmd(cmd)  # type: ignore
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 235, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 215, in _local_cmd
    raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cpfvnpnqlv0o9rdteq20-agent', 'kubectl', 'exec', 'rp-cpfvnpnqlv0o9rdteq20-0', '-n=redpanda', '-c=redpanda', '--', 'bash', '-c', '"rpk cluster health"']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 128, in wrapped
    redpanda.raise_on_crash(log_allow_list=log_allow_list)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2172, in raise_on_crash
    active, _, _ = self.get_redpanda_pods_presorted()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1757, in get_redpanda_pods_presorted
    all_pods = self.get_redpanda_pods()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1785, in get_redpanda_pods
    pods = json.loads(self.kubectl.cmd('get pods -n redpanda -o json'))
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 259, in cmd
    return self._ssh_cmd(cmd, capture=capture)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 235, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 215, in _local_cmd
    raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cpfvnpnqlv0o9rdteq20-agent', 'kubectl', 'get', 'pods', '-n', 'redpanda', '-o', 'json']' returned non-zero exit status 1.

JIRA Link: CORE-3199

travisdowns commented 1 week ago
{ "duplicate": "https://github.com/redpanda-data/redpanda/issues/19922" }

Same "no reverse tunnel" underlying error as above.

travisdowns commented 1 week ago

Duplicate of https://github.com/redpanda-data/redpanda/issues/19922