Open AndersWXJY opened 3 months ago
Hi @AndersWXJY 👋
Could you be a bit more specific? Does this cause a crash or do you mean that the max_batch_size
isn't enforced even if it should?
Also if you have an example of how to reproduce the bug/unwanted behavior, that would help a lot 👍
Yeah I mean that the max_batch_size
isn't enforced even if it should.
I am working on customization for a new device. As a backend of text-generation-server, there is a limit of max batch size such as 32 at decode stage, even if the max_batch_prefill_tokens
and max_batch_total_tokens
are under limit. So, the batching task could send a new batch but the server backend couldn't process anymore.
So i insert a code segment as below at the beginning of function State.next_batch()
to make it work well.
// Check if we have enough batch slots
if let Some(max_size) = max_size {
if max_size <= 0 {
tracing::debug!("Not enough batch slots for new requests!");
return None;
}
}
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
https://github.com/huggingface/text-generation-inference/blob/4dfdb481fb1f9cf31561c056061d693f38ba4168/router/src/infer/v3/queue.rs#L362
When max_size is 0, but the batch_requests.len() > 0,max_batch_size limit fails.