redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.56k stars 582 forks source link

cloud_storage: cancel/limit reader fibers when clients timeout RPCs #11799

Open jcsp opened 1 year ago

jcsp commented 1 year ago

If a system is overloaded, such that tiered storage reads (including timequeries) are not completing promptly, then Kafka clients will tend to close their connections and issue another request.

When a tiered storage read is in flight, we do not cancel it when the original Kafka request's client closes their connection: it will remain enqueued and run to completion.

On an overloaded system, this can lead to unbounded growth in the number of reads in flight.

There are a couple of angles to addressing this:

JIRA Link: CORE-1358

jcsp commented 1 year ago

The more specific case of this for timequeries is https://github.com/redpanda-data/redpanda/issues/10854, but we probably need a solution for all fetch requests.

BenPope commented 1 year ago

Unassigning myself since #12021 doesn't quite get us there.

VladLazar commented 1 year ago

@BenPope could you drop a note on what the gap between the improvements from #12021 and this ticket?

BenPope commented 1 year ago

@BenPope could you drop a note on what the gap between the improvements from #12021 and this ticket?

12021 is just the last bullet point above.

github-actions[bot] commented 8 months ago

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

piyushredpanda commented 8 months ago

This is still something we need to pursue.