Feature Request: Set maximum number of in flight

When unexpected large bursts in requests come to my application I would like to be able to limit the number of requests that will be accepted by trtllm backend. I would like to be able to REJECT future requests if the number of active requests for a specific backend exceeds a threshold

I have tried with

dynamic_batching {
  default_queue_policy {
    timeout_action: REJECT
    max_queue_size: 30  
  }
}

But would like to achieve this behavior so that i can better balance my load (and not have one instance with a large backlog)

triton-inference-server / tensorrtllm_backend

Feature Request: Set maximum number of in flight #412