redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.65k stars 589 forks source link

[23.1.x] CI Failure (Arm: BadLogLines connection reset by peer in hydration loop) in `ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy` #14169

Closed NyaliaLui closed 1 year ago

NyaliaLui commented 1 year ago

https://buildkite.com/redpanda/redpanda/builds/38877#018b26e6-ea69-4176-a548-425416b8c5e4

Module: rptest.tests.e2e_shadow_indexing_test
Class:  ShadowIndexingWhileBusyTest
Method: test_create_or_delete_topics_while_busy
Arguments:
{
  "cloud_storage_type": 1,
  "short_retention": true
}
test_id:    rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.S3
status:     FAIL
run time:   10 minutes 9.258 seconds

    <BadLogLines nodes=ip-172-31-11-75(1),ip-172-31-14-75(1) example="ERROR 2023-10-13 10:17:43,365 [shard 2] cloud_storage - [fiber11~10 kafka/topic-hnzjwpanso/10 [245:286]] - remote_segment.cc:706 - Error in hydration loop: std::__1::system_error (error system:104, read: Connection reset by peer)">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 68, in wrapped
    self.redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1900, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-11-75(1),ip-172-31-14-75(1) example="ERROR 2023-10-13 10:17:43,365 [shard 2] cloud_storage - [fiber11~10 kafka/topic-hnzjwpanso/10 [245:286]] - remote_segment.cc:706 - Error in hydration loop: std::__1::system_error (error system:104, read: Connection reset by peer)">
NyaliaLui commented 1 year ago

Marking as sev/low since it's BadLogLines

andrwng commented 1 year ago

Looks like this still exists on dev. Perhaps the error code received here is more common on ARM.

Regardless, agreed this seems low priority (it doesn't block consumers or anything), though we should fix on dev.

piyushredpanda commented 1 year ago

Dev issues seem different: https://github.com/redpanda-data/redpanda/issues?q=is%3Aissue+is%3Aopen+label%3Aci-failure+-label%3Akind%2Fbackport+-label%3Aarea%2Fk8s+test_create_or_delete_topics_while_busy

But will close this down for the backport of those fixes from dev to come on this backport.