Hello,when I was running the benchmark_serving.py on the TGI backend, I got the above error. When I defined the --num-prompts as 256, and the --request-rate as 32, I end up with less than 256 successful requests (the --max-concurrent-requests was set to 200).
Your current environment
🐛 Describe the bug
[Bug]: 2024-05-21T06:25:30.209697Z ERROR generate_stream{parameters=GenerateParameters { best_of: Some(1), temperature: Some(0.01), repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: Some(0.99), typical_p: None, do_sample: true, max_new_tokens: Some(393), return_full_text: None, stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None }}:async_stream:generate_stream: text_generation_router::infer: router/src/infer.rs:130: no permits available
Hello,when I was running the benchmark_serving.py on the TGI backend, I got the above error. When I defined the --num-prompts as 256, and the --request-rate as 32, I end up with less than 256 successful requests (the --max-concurrent-requests was set to 200).
Can anyone help me? Thank you