redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.44k stars 580 forks source link

CI Failure (key symptom) in `OMBValidationTest.test_max_connections` #21404

Closed vbotbuildovich closed 2 months ago

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/15511

Module: rptest.redpanda_cloud_tests.omb_validation_test
Class: OMBValidationTest
Method: test_max_connections
test_id:    OMBValidationTest.test_max_connections
status:     FAIL
run time:   1273.547 seconds

CalledProcessError(1, ['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cq7pjkr80m0kpn268rrg-agent', 'kubectl', 'get', 'pods', '-n', 'redpanda', '-o', 'json'], '', '\x1b[31mERROR: \x1b[0mfailed connecting to host cq7pjkr80m0kpn268rrg-agent:0: failed to receive cluster details response\n\tfailed to dial target host\n\tTeleport proxy failed to connect to "node" agent "@local-node" over reverse tunnel:\n\n  no tunnel connection found: no node reverse tunnel for bbb93908-c46c-464d-9243-ae53e963624e.proxy.tp.redpanda.com found\n\nThis usually means that the agent is offline or has disconnected. Check the\nagent logs and, if the issue persists, try restarting it or re-registering it\nwith the cluster.\n\n')
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 105, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/omb_validation_test.py", line 411, in test_max_connections
    assert_no_rejected()
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/omb_validation_test.py", line 285, in assert_no_rejected
    rejected_now = self._rejected_count()
  File "/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests/omb_validation_test.py", line 757, in _rejected_count
    return self.redpanda.metric_sum(REJECTED_METRIC)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2068, in metric_sum
    return self._metric_sum(metric_name, pods, metrics_endpoint, namespace,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1210, in _metric_sum
    metrics = self.metrics(n, metrics_endpoint=metrics_endpoint)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2006, in metrics
    text = self.kubectl.exec(
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 292, in exec
    return self._ssh_cmd(cmd)  # type: ignore
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 252, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 232, in _local_cmd
    raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cq7pjkr80m0kpn268rrg-agent', 'kubectl', 'exec', 'rp-cq7pjkr80m0kpn268rrg-0', '-n=redpanda', '-c=redpanda', '--', 'bash', '-c', '"curl -f -s -S http://localhost:9644/metrics"']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 128, in wrapped
    redpanda.raise_on_crash(log_allow_list=log_allow_list)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2180, in raise_on_crash
    active, _, _ = self.get_redpanda_pods_presorted()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1765, in get_redpanda_pods_presorted
    all_pods = self.get_redpanda_pods()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1793, in get_redpanda_pods
    pods = json.loads(self.kubectl.cmd('get pods -n redpanda -o json'))
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 276, in cmd
    return self._ssh_cmd(cmd, capture=capture)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 252, in _ssh_cmd
    return self._local_cmd(local_cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/kubectl.py", line 232, in _local_cmd
    raise subprocess.CalledProcessError(process.returncode, cmd, s_out,
subprocess.CalledProcessError: Command '['tsh', 'ssh', '--proxy=proxy.tp.redpanda.com:443', '--auth=okta', '--identity=/tmp/machine-id/identity', 'redpanda@cq7pjkr80m0kpn268rrg-agent', 'kubectl', 'get', 'pods', '-n', 'redpanda', '-o', 'json']' returned non-zero exit status 1.

JIRA Link: CORE-5613

dotnwat commented 2 months ago

dupe