redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.51k stars 580 forks source link

CI Failure (Truncation couldn't be verified) in `OffsetForLeaderEpochArchivalTest.test_querying_archive` #19952

Closed r-vasquez closed 1 week ago

r-vasquez commented 3 months ago

https://buildkite.com/redpanda/redpanda/builds/50525#01903c5d-df0f-4a9a-ad4d-24aa0cb50fb6

Module: rptest.tests.offset_for_leader_epoch_archival_test
Class:  OffsetForLeaderEpochArchivalTest
Method: test_querying_archive
TimeoutError("truncation couldn't be verified for topic='topic-dphwwkhivk' and target_bytes=8192. last run partition_sizes=[526, 526, 522989103]")
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 105, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/offset_for_leader_epoch_archival_test.py", line 219, in test_querying_archive
    wait_for_local_storage_truncate(self.redpanda,
  File "/root/tests/rptest/util.py", line 287, in wait_for_local_storage_truncate
    wait_until(
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: truncation couldn't be verified for topic='topic-dphwwkhivk' and target_bytes=8192. last run partition_sizes=[526, 526, 522989103]

JIRA Link: CORE-4271

dotnwat commented 3 months ago

https://buildkite.com/redpanda/redpanda/builds/50933#019067d4-e0f4-4d7e-b0c4-f00cbeb53fdb

jcipar commented 2 months ago

This is a timeout error calling wait_for_local_storage_truncate. In this test we override the default timeout (120 seconds) and set it to 30 seconds.

@Lazin Do you remember why you set it that way?

Across all the tests, we override the default 9 times (not including once where we explicitly set it to 120). We use the default value 19 times.

piyushredpanda commented 1 week ago

Closing older-bot-filed CI issues as we transition to a more reliable system.