redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.61k stars 585 forks source link

CI Failure (BadLogLines) in `RandomNodeOperationsTest.test_node_operations` #15150

Closed andijcr closed 1 month ago

andijcr commented 11 months ago

https://buildkite.com/redpanda/vtools/builds/10805

This issue is an umbrella for 3 instances of bad log lines:

  1. Can't apply override to kafka start offset -> Fixed in #15192
  2. Failed to make reupload candidate to match the run, candidate: {source segment offsets: {term:3, base_offset:15, committed_offset:483, dirty_offset:483}, exposed_name: {15-3-v1.log}, starting_offset: 15, file_offset: 0, content_length: 5108, final_offset: 483, final_file_offset: 5108, term: 3, source names: {/var/lib/redpanda/data/kafka/fuzzy-operator-2047-ygyvas/0_6696/15-3-v1.log}}, run: {is_compacted: true, size_bytes: 5224, base_offset: 15, committed_offset: 483, base_timestamp: {timestamp: 1703239361564}, max_timestamp: {timestamp: 1703239430338}, delta_offset: 15, ntp_revision: 64, archiver_term: 4, segment_term: 3, delta_offset_end: 25, sname_format: {v3}, metadata_size_hint: 0} Discussed in https://redpandadata.slack.com/archives/C05S5SBT4CU/p1704303384062129 Builds:
  3. ERROR 2023-12-14 11:38:06,999 [shard 0:main] cluster - topic_table.cc:184 - Unexpected record at offset 5399 to drop non-existent lifecycle marker {kafka/fuzzy-operator-5709-agcwue} 3685 Builds:
  4. ERROR 2023-12-18 11:13:10,469 [shard 0:au ] archival - [fiber1~3|0|19974ms] - purger.cc:168 - Permanent failures encountered while purging partition {kafka/fuzzy-operator-532-qebtjv/2}. Giving up ... Builds:
Module: rptest.tests.random_node_operations_test
Class: RandomNodeOperationsTest
Method: test_node_operations
Arguments: {
    "num_to_upgrade": 3,
    "enable_failures": false,
    "with_tiered_storage": true
}
test_id:    RandomNodeOperationsTest.test_node_operations
status:     FAIL
run time:   2061.472 seconds

<BadLogLines nodes=ip-172-31-33-161(2) example="ERROR 2023-11-25 23:15:40,652 [shard 0] cluster - ntp: {kafka/fuzzy-operator-2796-ncuizc/5} - archival_metadata_stm.cc:1229 - Can't apply override to kafka start offset 29, currently 29">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 142, in wrapped
    redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1321, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-33-161(2) example="ERROR 2023-11-25 23:15:40,652 [shard 0] cluster - ntp: {kafka/fuzzy-operator-2796-ncuizc/5} - archival_metadata_stm.cc:1229 - Can't apply override to kafka start offset 29, currently 29">

JIRA Link: CORE-1601

andijcr commented 11 months ago

not sure if this should be reported as error

mmaslankaprv commented 11 months ago

This was already fixed with: https://github.com/redpanda-data/redpanda/pull/15192

vbotbuildovich commented 10 months ago

https://buildkite.com/redpanda/vtools/builds/11151 https://buildkite.com/redpanda/vtools/builds/11151

vbotbuildovich commented 10 months ago

*https://buildkite.com/redpanda/vtools/builds/11202

vbotbuildovich commented 10 months ago

*https://buildkite.com/redpanda/vtools/builds/11214

vbotbuildovich commented 10 months ago

*https://buildkite.com/redpanda/vtools/builds/11243

vbotbuildovich commented 10 months ago

*https://buildkite.com/redpanda/vtools/builds/11262

andijcr commented 10 months ago

*https://buildkite.com/redpanda/vtools/builds/11262

at least the last failure mode is different:

<BadLogLines nodes=ip-172-31-13-176(2),ip-172-31-5-190(7),ip-172-31-2-19(29) example="ERROR 2023-12-22 10:09:03,417 [shard 1:au  ] archival - [fiber11 kafka/fuzzy-operator-2047-ygyvas/0] - ntp_archiver_service.cc:2832 - Failed to make reupload candidate to match the run, candidate: {source segment offsets: {term:3, base_offset:15, committed_offset:483, dirty_offset:483}, exposed_name: {15-3-v1.log}, starting_offset: 15, file_offset: 0, content_length: 5108, final_offset: 483, final_file_offset: 5108, term: 3, source names: {/var/lib/redpanda/data/kafka/fuzzy-operator-2047-ygyvas/0_6696/15-3-v1.log}}, run: {is_compacted: true, size_bytes: 5224, base_offset: 15, committed_offset: 483, base_timestamp: {timestamp: 1703239361564}, max_timestamp: {timestamp: 1703239430338}, delta_offset: 15, ntp_revision: 64, archiver_term: 4, segment_term: 3, delta_offset_end: 25, sname_format: {v3}, metadata_size_hint: 0}">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 142, in wrapped
    redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1325, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-13-176(2),ip-172-31-5-190(7),ip-172-31-2-19(29) example="ERROR 2023-12-22 10:09:03,417 [shard 1:au  ] archival - [fiber11 kafka/fuzzy-operator-2047-ygyvas/0] - ntp_archiver_service.cc:2832 - Failed to make reupload candidate to match the run, candidate: {source segment offsets: {term:3, base_offset:15, committed_offset:483, dirty_offset:483}, exposed_name: {15-3-v1.log}, starting_offset: 15, file_offset: 0, content_length: 5108, final_offset: 483, final_file_offset: 5108, term: 3, source names: {/var/lib/redpanda/data/kafka/fuzzy-operator-2047-ygyvas/0_6696/15-3-v1.log}}, run: {is_compacted: true, size_bytes: 5224, base_offset: 15, committed_offset: 483, base_timestamp: {timestamp: 1703239361564}, max_timestamp: {timestamp: 1703239430338}, delta_offset: 15, ntp_revision: 64, archiver_term: 4, segment_term: 3, delta_offset_end: 25, sname_format: {v3}, metadata_size_hint: 0}">
vbotbuildovich commented 9 months ago

https://buildkite.com/redpanda/vtools/builds/11301 https://buildkite.com/redpanda/vtools/builds/11329 *https://buildkite.com/redpanda/vtools/builds/11351

vbotbuildovich commented 9 months ago

https://buildkite.com/redpanda/vtools/builds/11301 https://buildkite.com/redpanda/vtools/builds/11367

vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11490

vbotbuildovich commented 9 months ago

https://buildkite.com/redpanda/vtools/builds/11502 https://buildkite.com/redpanda/vtools/builds/11516

vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11527

vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11585

vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11616

vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11630

vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11630

vbotbuildovich commented 9 months ago

https://buildkite.com/redpanda/vtools/builds/11651 <BadLogLines nodes=rp-node-9-needlessly-choice-egret(1) example="ERROR 2024-01-30 09:22:56,880 [shard 2:au ] cloud_storage - [fiber76~122|0|29632ms] - remote.cc:1243 - Failed to delete keys: Semaphore broken: self_config_barrier"> this specific instance of failure will be cleaned up here https://github.com/redpanda-data/redpanda/pull/16385

vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11668

TimeoutError('Redpanda service rp-node-14-early-subtle-jennet failed to start within 360 sec')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 99, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/tests/random_node_operations_test.py", line 428, in test_node_operations
    executor.execute_operation(op)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 430, in execute_operation
    self.add(operation.node)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 388, in add
    self.redpanda.start_node(node,
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2487, in start_node
    self.start_service(node, start_rp)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2563, in start_service
    start()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 2478, in start_rp
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Redpanda service rp-node-14-early-subtle-jennet failed to start within 360 sec
vbotbuildovich commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11704

andijcr commented 9 months ago

*https://buildkite.com/redpanda/vtools/builds/11704

<BadLogLines nodes=rp-node-11-perfectly-handy-oryx(1) example="ERROR 2024-02-03 09:04:19,088 [shard 1:au  ] archival - [fiber1~8|0|19915ms] - purger.cc:168 - Permanent failures encountered while purging partition {kafka/fuzzy-operator-6695-lqmlrr/7}. Giving up ...">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 168, in wrapped
    redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1413, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=rp-node-11-perfectly-handy-oryx(1) example="ERROR 2024-02-03 09:04:19,088 [shard 1:au  ] archival - [fiber1~8|0|19915ms] - purger.cc:168 - Permanent failures encountered while purging partition {kafka/fuzzy-operator-6695-lqmlrr/7}. Giving up ...">

a look at the log points out to the manifest for the partition not being in cloud storage, so maybe an edge case of a manifest not yet uploaded or a race in deleting it https://github.com/redpanda-data/redpanda/blob/72b7011faf7a1cb29765db2a2b4343667a03cc0a/src/v/archival/purger.cc#L239-L264

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/11756

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/11766

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/11824

vbotbuildovich commented 8 months ago

https://buildkite.com/redpanda/vtools/builds/11845 https://buildkite.com/redpanda/vtools/builds/11858

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/11870

vbotbuildovich commented 8 months ago

https://buildkite.com/redpanda/vtools/builds/11895 https://buildkite.com/redpanda/vtools/builds/11902

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/11929

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/12024

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/12025

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/12035

vbotbuildovich commented 8 months ago

*https://buildkite.com/redpanda/vtools/builds/12114

vbotbuildovich commented 7 months ago

*https://buildkite.com/redpanda/vtools/builds/12156#018e155b-17dd-46bb-b84f-7d8d535f4147

vbotbuildovich commented 7 months ago

*https://buildkite.com/redpanda/vtools/builds/12383

vbotbuildovich commented 7 months ago

https://buildkite.com/redpanda/vtools/builds/12434 https://buildkite.com/redpanda/vtools/builds/12434

vbotbuildovich commented 7 months ago

*https://buildkite.com/redpanda/vtools/builds/12455

vbotbuildovich commented 7 months ago

*https://buildkite.com/redpanda/vtools/builds/12521

vbotbuildovich commented 6 months ago

*https://buildkite.com/redpanda/vtools/builds/13280

vbotbuildovich commented 4 months ago

*https://buildkite.com/redpanda/redpanda/builds/50094

vbotbuildovich commented 4 months ago

*https://buildkite.com/redpanda/redpanda/builds/50094

vbotbuildovich commented 4 months ago

*https://buildkite.com/redpanda/redpanda/builds/50094

vbotbuildovich commented 4 months ago

https://buildkite.com/redpanda/redpanda/builds/50094 https://buildkite.com/redpanda/vtools/builds/14609 *https://buildkite.com/redpanda/vtools/builds/14625

vbotbuildovich commented 4 months ago

https://buildkite.com/redpanda/vtools/builds/14812 https://buildkite.com/redpanda/vtools/builds/14865

vbotbuildovich commented 4 months ago

*https://buildkite.com/redpanda/vtools/builds/14847

vbotbuildovich commented 2 months ago

*https://buildkite.com/redpanda/vtools/builds/16332

vbotbuildovich commented 2 months ago

*https://buildkite.com/redpanda/vtools/builds/16383

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/16399 https://buildkite.com/redpanda/vtools/builds/16408

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/16399 https://buildkite.com/redpanda/vtools/builds/16399 https://buildkite.com/redpanda/vtools/builds/16399 https://buildkite.com/redpanda/vtools/builds/16408 https://buildkite.com/redpanda/vtools/builds/16408 https://buildkite.com/redpanda/vtools/builds/16408 https://buildkite.com/redpanda/vtools/builds/16407 https://buildkite.com/redpanda/vtools/builds/16407 https://buildkite.com/redpanda/vtools/builds/16407 https://buildkite.com/redpanda/vtools/builds/16407

vbotbuildovich commented 2 months ago

*https://buildkite.com/redpanda/vtools/builds/16412

vbotbuildovich commented 2 months ago

https://buildkite.com/redpanda/vtools/builds/16412 https://buildkite.com/redpanda/vtools/builds/16420