Closed r-vasquez closed 1 year ago
https://buildkite.com/redpanda/redpanda/builds/21719#0185e14a-0846-43ea-bd4b-98deea9dcaaf
FAIL test: NodesDecommissioningTest.test_flipping_decommission_recommission.node_is_alive=False (2/8 runs)
failure at 2023-01-24T02:11:27.574Z: AssertionError('Node 1 decommissioning stopped making progress')
this is a real issue that may happen in a boundry condition when one wants to recommission a node that is offline
Got this (here: https://buildkite.com/redpanda/redpanda/builds/21716#0185e0d8-eb06-4c6e-ba2a-6f3fb6d8821b/6-2207) in the same test
RpkException('command /var/lib/buildkite-agent/builds/buildkite-amd64-xfs-builders-i-0175d7750481fef5a-1/redpanda/redpanda/vbuild/redpanda_installs/ci/bin/rpk --api-urls docker-rp-6:9644,docker-rp-22:9644,docker-rp-20:9644 cluster config set raft_learner_recovery_rate 1 returned 1, output: ', 'error setting property: request PUT http://docker-rp-6:9644/v1/cluster_config failed: Service Unavailable, body: "{\\"message\\": \\"Leader not available\\", \\"code\\": 503}"\n\n')
--
The "{\\"message\\": \\"Leader not available\\", \\"code\\": 503}"
comes from the Admin API, do you think it's related or should I open a new Issue?
@r-vasquez it has different stack-trace so it should be a different issue
Probably old bits https://buildkite.com/redpanda/redpanda/builds/21837#0185e9e5-f853-46c8-a592-9b7a76d68586
FAIL test: NodesDecommissioningTest.test_flipping_decommission_recommission.node_is_alive=False (1/39 runs)
failure at 2023-01-25T18:18:07.160Z: AssertionError('Node 1 decommissioning stopped making progress')
on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/21837#0185e9e5-f853-46c8-a592-9b7a76d68586
https://buildkite.com/redpanda/redpanda/builds/21437#0185c5bc-b8ec-48bc-a229-dfd05b5d6bd6/6-2504