redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.49k stars 580 forks source link

[v23.2.x] CI Failure (`Redpanda node docker-rp-19 failed to stop in 30 seconds`) in `ShutdownTest.test_timely_shutdown_with_failures` #15295

Closed michael-redpanda closed 3 months ago

michael-redpanda commented 10 months ago

https://buildkite.com/redpanda/vtools/builds/10980#018c356d-9e3b-4936-a00b-6db91546a2b6

Module: rptest.tests.timely_shutdown_test
Class:  ShutdownTest
Method: test_timely_shutdown_with_failures
test_id:    rptest.tests.timely_shutdown_test.ShutdownTest.test_timely_shutdown_with_failures
status:     FAIL
run time:   3 minutes 32.134 seconds

    TimeoutError('Redpanda node docker-rp-19 failed to stop in 30 seconds')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/timely_shutdown_test.py", line 105, in test_timely_shutdown_with_failures
    self.redpanda.restart_nodes(leader)
  File "/root/tests/rptest/services/redpanda.py", line 882, in restart_nodes
    list(
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/root/tests/rptest/services/redpanda.py", line 883, in <lambda>
    executor.map(lambda n: self.stop_node(n, timeout=stop_timeout),
  File "/root/tests/rptest/services/redpanda.py", line 2738, in stop_node
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Redpanda node docker-rp-19 failed to stop in 30 seconds
michael-redpanda commented 10 months ago

The node does appear to shutdown:

INFO  2023-12-04 16:06:20,102 [shard 0] main - application.cc:369 - Shutdown complete.                                                                                                                                                                                                 
DEBUG 2023-12-04 16:06:20,104 [shard 0] seastar - reactor::drain

but maybe not enough time? It appeared to happen on the fourth iteration through the restart loop

michael-redpanda commented 9 months ago

do not use this issue to track dev failures, if you observe similar failure on dev - create new issue, this issue is only for backports

dotnwat commented 9 months ago

05:49 06:20

is like 30 seconds difference between the time it looks like the stop was requested and the time it actually finished shutting down. so something slow about the shutdown, but it was like a second or so off from the deadline.

But also on a debug build, so lots of weird slowness could occur.

Might be worth having a different timeout for debug vs release.

michael-redpanda commented 8 months ago

Marking sev/low as it may be a timeout issue with the test infrastructure.

piyushredpanda commented 3 months ago

Not seen in at least two months, closing