huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.99k stars 1.06k forks source link

max_batch_size limit doesn't work well at queue.next_batch() #2241

Open AndersWXJY opened 3 months ago

AndersWXJY commented 3 months ago

https://github.com/huggingface/text-generation-inference/blob/4dfdb481fb1f9cf31561c056061d693f38ba4168/router/src/infer/v3/queue.rs#L362

When max_size is 0, but the batch_requests.len() > 0,max_batch_size limit fails.

ErikKaum commented 3 months ago

Hi @AndersWXJY 👋

Could you be a bit more specific? Does this cause a crash or do you mean that the max_batch_size isn't enforced even if it should?

Also if you have an example of how to reproduce the bug/unwanted behavior, that would help a lot 👍

AndersWXJY commented 3 months ago

Yeah I mean that the max_batch_size isn't enforced even if it should. I am working on customization for a new device. As a backend of text-generation-server, there is a limit of max batch size such as 32 at decode stage, even if the max_batch_prefill_tokens and max_batch_total_tokens are under limit. So, the batching task could send a new batch but the server backend couldn't process anymore. So i insert a code segment as below at the beginning of function State.next_batch()to make it work well.

// Check if we have enough batch slots
if let Some(max_size) = max_size {
    if max_size <= 0 {
        tracing::debug!("Not enough batch slots for new requests!");
        return None;
    }
}
github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.