redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.48k stars 580 forks source link

CI Failure (key symptom) in `ControllerAvailabilityTest.test_controller_availability_with_nodes_down` #22379

Closed vbotbuildovich closed 2 months ago

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/15928 https://buildkite.com/redpanda/vtools/builds/15928 https://buildkite.com/redpanda/vtools/builds/15928 https://buildkite.com/redpanda/vtools/builds/15928 https://buildkite.com/redpanda/vtools/builds/15928 https://buildkite.com/redpanda/vtools/builds/15928 https://buildkite.com/redpanda/vtools/builds/15944 https://buildkite.com/redpanda/vtools/builds/15944 https://buildkite.com/redpanda/vtools/builds/15944 https://buildkite.com/redpanda/vtools/builds/15944 https://buildkite.com/redpanda/vtools/builds/15944 https://buildkite.com/redpanda/vtools/builds/15944 https://buildkite.com/redpanda/vtools/builds/15952 https://buildkite.com/redpanda/vtools/builds/15952 https://buildkite.com/redpanda/vtools/builds/15952 https://buildkite.com/redpanda/vtools/builds/15952 https://buildkite.com/redpanda/vtools/builds/15952 https://buildkite.com/redpanda/vtools/builds/15952

Module: rptest.tests.controller_availability_test
Class: ControllerAvailabilityTest
Method: test_controller_availability_with_nodes_down
Arguments: {
    "stop": "minority",
    "cluster_size": 3
}
test_id:    ControllerAvailabilityTest.test_controller_availability_with_nodes_down
status:     FAIL
run time:   5.634 seconds

RemoteCommandError({'ssh_config': {'host': 'ip-172-31-5-33', 'hostname': '172.31.5.33', 'user': 'root', 'port': 22, 'password': None, 'identityfile': '/home/ubuntu/.ssh/id_rsa'}, 'hostname': 'ip-172-31-5-33', 'ssh_hostname': '172.31.5.33', 'user': 'root', 'externally_routable_ip': '54.201.146.94', '_logger': <Logger rptest.tests.controller_availability_test.ControllerAvailabilityTest.test_controller_availability_with_nodes_down.cluster_size=3.stop=minority-443 (DEBUG)>, 'os': 'linux', '_ssh_client': <paramiko.client.SSHClient object at 0x7faf09c0ec20>, '_sftp_client': <paramiko.sftp_client.SFTPClient object at 0x7faf09c0d0c0>, '_custom_ssh_exception_checks': None}, 'host ip-172-31-5-33.us-west-2.compute.internal', 127, b'')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 105, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/controller_availability_test.py", line 83, in test_controller_availability_with_nodes_down
    self.start_redpanda(cluster_size)
  File "/home/ubuntu/redpanda/tests/rptest/tests/controller_availability_test.py", line 33, in start_redpanda
    self.redpanda.start()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2719, in start
    self.for_nodes(to_start, start_one)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1489, in for_nodes
    return list(executor.map(cb, nodes))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2711, in start_one
    self.start_node(node,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 3021, in start_node
    self.write_node_conf_file(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 3947, in write_node_conf_file
    fqdn = self.get_node_fqdn(node)
ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-5-33: Command 'host ip-172-31-5-33.us-west-2.compute.internal' returned non-zero exit status 127.

JIRA Link: CORE-6530

michael-redpanda commented 2 months ago

Automatically closing issue to match current state of CORE-6530

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/16003 https://buildkite.com/redpanda/vtools/builds/16003 https://buildkite.com/redpanda/vtools/builds/16003 https://buildkite.com/redpanda/vtools/builds/16003 https://buildkite.com/redpanda/vtools/builds/16003 https://buildkite.com/redpanda/vtools/builds/16003 https://buildkite.com/redpanda/vtools/builds/16016 https://buildkite.com/redpanda/vtools/builds/16016 https://buildkite.com/redpanda/vtools/builds/16016 https://buildkite.com/redpanda/vtools/builds/16016 https://buildkite.com/redpanda/vtools/builds/16016 https://buildkite.com/redpanda/vtools/builds/16016 https://buildkite.com/redpanda/vtools/builds/16030 https://buildkite.com/redpanda/vtools/builds/16030 https://buildkite.com/redpanda/vtools/builds/16030 https://buildkite.com/redpanda/vtools/builds/16030 https://buildkite.com/redpanda/vtools/builds/16030 https://buildkite.com/redpanda/vtools/builds/16030

rpdevmp commented 2 months ago

Duplicate of #21624 That GH issue has all the details of many tests failing, and what are the steps to avoid similar cases in the future, where many CI failures got opened based on similar infra errors. Updating Github automatically and AWS with FIPS run should be fixed by https://github.com/redpanda-data/vtools/pull/3018