redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.55k stars 582 forks source link

CI Failure (key symptom) in `StorageFailureInjectionTest.test_storage_failure_injection` #21734

Closed vbotbuildovich closed 2 months ago

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/15928

Module: rptest.tests.storage_failure_injection_test
Class: StorageFailureInjectionTest
Method: test_storage_failure_injection
test_id:    StorageFailureInjectionTest.test_storage_failure_injection
status:     FAIL
run time:   4.560 seconds

RemoteCommandError({'ssh_config': {'host': 'ip-172-31-4-208', 'hostname': '172.31.4.208', 'user': 'root', 'port': 22, 'password': None, 'identityfile': '/home/ubuntu/.ssh/id_rsa'}, 'hostname': 'ip-172-31-4-208', 'ssh_hostname': '172.31.4.208', 'user': 'root', 'externally_routable_ip': '35.89.118.77', '_logger': <Logger rptest.tests.storage_failure_injection_test.StorageFailureInjectionTest.test_storage_failure_injection-180 (DEBUG)>, 'os': 'linux', '_ssh_client': <paramiko.client.SSHClient object at 0x7faf0b7314b0>, '_sftp_client': <paramiko.sftp_client.SFTPClient object at 0x7faf0b7d1090>, '_custom_ssh_exception_checks': None}, 'host ip-172-31-4-208.us-west-2.compute.internal', 127, b'')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 105, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/storage_failure_injection_test.py", line 62, in test_storage_failure_injection
    self.redpanda.start()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2719, in start
    self.for_nodes(to_start, start_one)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1489, in for_nodes
    return list(executor.map(cb, nodes))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2711, in start_one
    self.start_node(node,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 3021, in start_node
    self.write_node_conf_file(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 3947, in write_node_conf_file
    fqdn = self.get_node_fqdn(node)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 3883, in get_node_fqdn
    fqdn = node.account.ssh_output(
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/cluster/remoteaccount.py", line 41, in wrapper
    return method(self, *args, **kwargs)
ducktape.cluster.remoteaccount.RemoteCommandError: root@ip-172-31-4-208: Command 'host ip-172-31-4-208.us-west-2.compute.internal' returned non-zero exit status 127.

JIRA Link: CORE-5879

rpdevmp commented 2 months ago

Duplicate of #21624 That GH issue has all the details of many tests failing, and what are the steps to avoid similar cases in the future, where many issues got opened based on similar infra errors

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/15980 https://buildkite.com/redpanda/vtools/builds/16003 *https://buildkite.com/redpanda/vtools/builds/16016

vbotbuildovich commented 2 months ago

*https://buildkite.com/redpanda/vtools/builds/16030

rpdevmp commented 2 months ago

Duplicate of #21624 That GH issue has all the details of many tests failing, and what are the steps to avoid similar cases in the future, where many CI failures got opened based on similar infra errors. Updating Github automatically and AWS with FIPS run should be fixed by https://github.com/redpanda-data/vtools/pull/3018