Let's assume we have two different buckets. Bucket A has a high capacity unit and bucket B has a low capacity unit. All the requests to bucket A are served in a very short time, all the requests going to bucket B need a very long time to be processed by the backend. Let's assume requests for both buckets come in at a similar rate.
In such a scenario, it is likely that there will be times when there are no requests in bucket A (because they are leaving it so quickly). In these moments, we might have requests coming in for bucket B and bucket B would expand to the global max_requests defined in the limits configuration, because we don't have any competing requests in the other bucket. During the long processing time of these requests, new high-priority requests might come in for bucket A, but they would be rejected (unless there is a big buffer defined).
Let's assume we have two different buckets. Bucket A has a high capacity unit and bucket B has a low capacity unit. All the requests to bucket A are served in a very short time, all the requests going to bucket B need a very long time to be processed by the backend. Let's assume requests for both buckets come in at a similar rate.
In such a scenario, it is likely that there will be times when there are no requests in bucket A (because they are leaving it so quickly). In these moments, we might have requests coming in for bucket B and bucket B would expand to the global max_requests defined in the limits configuration, because we don't have any competing requests in the other bucket. During the long processing time of these requests, new high-priority requests might come in for bucket A, but they would be rejected (unless there is a big buffer defined).