redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.73k stars 591 forks source link

CI Failure (GetParam() = false) in `gtest_cluster_cloud_metadata_rpfixture.WithLeadershipChanges/ClusterRecoveryBackendLeadershipParamTest.TestRecoveryControllerState` #17813

Closed andijcr closed 6 months ago

andijcr commented 7 months ago

https://buildkite.com/redpanda/redpanda/builds/47610#018ec863-93f3-4657-8144-7da19cead72b

gtest_cluster_cloud_metadata_rpfixture

WithLeadershipChanges/ClusterRecoveryBackendLeadershipParamTest.TestRecoveryControllerState` where GetParam() = false

release build

the failure should be on dev: the originating pr does not modify any c++ code

JIRA Link: CORE-2338

dotnwat commented 7 months ago

@andijcr would appreciate it if you used the CI failure template

andijcr commented 7 months ago

@andijcr would appreciate it if you used the CI failure template

@dotnwat Do we have a template for fixture test failures? The one we have is for ducktape and it's not clear what to write and where

rockwotj commented 7 months ago

We should make one because these are fairly common. The problem with the ducktape one is that it breaks pandatriage.

dotnwat commented 7 months ago

@andijcr good point I was reading this too fast and thought it was ducktape :)

andrwng commented 7 months ago
73 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=0, tombstone=0, index={file:test.dir_1712765242/redpanda/kvstore/0_0/0-0-v1.base_index, offsets:0, index:{header_bitflags:0, base_offset:0, max_offset:38, base_timestamp:{timestamp: 1712765243573}, max_timestamp:{timestamp: 1712765243842}, batch_timestamps_are_monotonic:1, with_offset:false, non_data_timestamps:0, broker_timestamp:{{timestamp: 1712765243842}}, num_compactible_records_appended:{39}, index(1,1,1)}, step:32768, needs_persistence:0}}
_bk;t=1712765258285unknown file: Failure
_bk;t=1712765258285C++ exception with description "configuration property cloud_storage_secret_key is not set" thrown in the test body.
_bk;t=1712765258285
_bk;t=1712765258285[  FAILED  ] WithLeadershipChanges/ClusterRecoveryBackendLeadershipParamTest.TestRecoveryControllerState/0, where GetParam() = false (1097 ms)

This looks like a test bug. We reset the shard local config immediately after restarting the application, but before the restore completes. This causes a race where by the time the cluster restore attempts to perform topic recovery, the cluster configs for cloud storage have been wiped.