redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.58k stars 582 forks source link

Failure in `NodesDecommissioningTest.test_decommissioning_working_node` #1892

Closed 0x5d closed 2 years ago

0x5d commented 3 years ago

The test rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_working_node failed when merging #1845 into dev, but passed when re-ran. Attaching the logs here:

failing-ducktape.txt

0x5d commented 3 years ago

I also saw this one today:

test_id:    rptest.tests.kafka_client_compat_test.KafkaClientCompatTest.test_describe_broker_configs

status:     FAIL

run time:   20.236 seconds

    KafkaException(KafkaError{code=_TIMED_OUT,val=-185,str="Failed to get metadata: Local: Timed out"})

Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/ducktape/tests/runner_client.py", line 133, in run

    self.setup_test()

  File "/usr/local/lib/python3.8/dist-packages/ducktape/tests/runner_client.py", line 206, in setup_test

    self.test.setup()

  File "/usr/local/lib/python3.8/dist-packages/ducktape/tests/test.py", line 91, in setup

    self.setUp()

  File "/root/tests/rptest/tests/redpanda_test.py", line 54, in setUp

    self.redpanda.start()

  File "/root/tests/rptest/services/redpanda.py", line 101, in start

    wait_until(lambda: {n

  File "/usr/local/lib/python3.8/dist-packages/ducktape/utils/util.py", line 53, in wait_until

    raise e

  File "/usr/local/lib/python3.8/dist-packages/ducktape/utils/util.py", line 44, in wait_until

    if condition():

  File "/root/tests/rptest/services/redpanda.py", line 101, in <lambda>

    wait_until(lambda: {n

  File "/root/tests/rptest/services/redpanda.py", line 103, in <setcomp>

    if self.registered(n)} == expected,

  File "/root/tests/rptest/services/redpanda.py", line 290, in registered

    brokers = client.brokers()

  File "/root/tests/rptest/clients/python_librdkafka.py", line 30, in brokers

    return client.list_topics(timeout=10).brokers

cimpl.KafkaException: KafkaError{code=_TIMED_OUT,val=-185,str="Failed to get metadata: Local: Timed out"}
VadimPlh commented 2 years ago

https://buildkite.com/vectorized/redpanda/builds/5071#f9ab8959-f7f6-446d-a1ae-dbf57aa421e1

in #3145

LenaAn commented 2 years ago

another one from https://github.com/vectorizedio/redpanda/pull/3445

gousteris commented 2 years ago

seen again https://buildkite.com/vectorized/redpanda/builds/6171#03b54b6d-64dd-45e6-bf38-2dca35acf4df

gousteris commented 2 years ago

another one: https://buildkite.com/vectorized/redpanda/builds/6267#08603e9f-3c89-4b40-92f5-ce4c6dad5f7c

redpanda_build_6267_debug-clang-amd64.log

jcsp commented 2 years ago

I just gave this the ci-failure label because it's open & has recent pings with failures, but it's quite old so the original report might be something different than the recent failures -- investigation should start with the most recent failures.

jcsp commented 2 years ago

Possible that this is same underlying cause as:

(where recovery to a new replica never quite catches up because of an issue where our reads don't see latest writes, so decom never completes)