redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.49k stars 580 forks source link

Incorrectly configured cloud storage results in CPU spinning #12376

Open tmgstevens opened 1 year ago

tmgstevens commented 1 year ago

Version & Environment

Redpanda version: (use rpk version): 22.3.15 and 23.1.13

What went wrong?

Cluster was provisioned with incorrect cloud configuration (wrong bucket). When adding topics and partitions, Redpanda spins the CPU. When deleting the topics, Redpanda then crashed with segmentation fault:

Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]: Backtrace:
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x534f8c6
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x53b3226
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   /opt/redpanda/lib/libc.so.6+0x42abf
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x38c3774
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x4a40c19
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x536d9ff
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x53716d7
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x536eaa9
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x528f641
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x528d75f
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x1b3529e
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x568344d
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   /opt/redpanda/lib/libc.so.6+0x2d58f
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   /opt/redpanda/lib/libc.so.6+0x2d648
Jul 20 10:49:17 ip-172-31-11-21 rpk[5060]:   0x1b2ffe4

What should have happened instead?

Need a backoff loop for cloud storage, or retry rate.

How to reproduce the issue?

  1. Provision cluster
  2. Enabled tiered storage
  3. Use wrong bucket name

Additional information

Please attach any relevant logs, backtraces, or metric charts.

JIRA Link: CORE-1378

jcsp commented 1 year ago

Cloud storage already has various retries and those include backoffs: but clearly something is wrong in the specific case of a cluster configured to point to a bucket that doesn't exist.

github-actions[bot] commented 8 months ago

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

tmgstevens commented 8 months ago

👍

dotnwat commented 6 months ago

Should try to reproduce this.