Request failed during generation: Server error: Batch ID 408 not found in cache.

huggingface / text-generation-inference

Large Language Model Text Generation Inference

http://hf.co/docs/text-generation-inference

Apache License 2.0

9.08k stars 1.07k forks source link

Request failed during generation: Server error: Batch ID 408 not found in cache. #2316

Open leizhao1234 opened 3 months ago

leizhao1234 commented 3 months ago

I have encountered the problem mentioned in the title. Could someone help me understand what is going on and how to resolve it? Any assistance would be greatly appreciated.

ErikKaum commented 3 months ago

Hi @leizhao1234 👋

Could you describe a way to reproduce the error? E.g.

What commands you used to run TGI
The full stacktrace/log output that you got

leizhao1234 commented 3 months ago

CUDA_VISIBLE_DEVICES="3" text-generation-launcher --model-id ./cogvlm_exp3_03_2001/ --num-shard 1 --port 8083 --max-concurrent-requests 409600 --max-input-length 8190 --max-total-tokens 8192 --max-batch-prefill-tokens 8192 --trust-remote-code --max-waiting-tokens 2 --waiting-served-ratio 0.2 --block-size 32 --cuda-memory-fraction 0.97

leizhao1234 commented 3 months ago

and here is another problem:

ErikKaum commented 3 months ago

Thank you! And you are you using the docker container or have you build this from soure?

leizhao1234 commented 3 months ago

I build from soure, and add the support for cogvlm.