huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.74k stars 1.01k forks source link

Request failed during generation: Server error: Batch ID 408 not found in cache. #2316

Open leizhao1234 opened 1 month ago

leizhao1234 commented 1 month ago

I have encountered the problem mentioned in the title. Could someone help me understand what is going on and how to resolve it? Any assistance would be greatly appreciated.

ErikKaum commented 1 month ago

Hi @leizhao1234 👋

Could you describe a way to reproduce the error? E.g.

leizhao1234 commented 1 month ago

CUDA_VISIBLE_DEVICES="3" text-generation-launcher --model-id ./cogvlm_exp3_03_2001/ --num-shard 1 --port 8083 --max-concurrent-requests 409600 --max-input-length 8190 --max-total-tokens 8192 --max-batch-prefill-tokens 8192 --trust-remote-code --max-waiting-tokens 2 --waiting-served-ratio 0.2 --block-size 32 --cuda-memory-fraction 0.97

截屏2024-07-26 17 47 01
leizhao1234 commented 1 month ago

and here is another problem:

截屏2024-07-26 17 47 30
ErikKaum commented 1 month ago

Thank you! And you are you using the docker container or have you build this from soure?

leizhao1234 commented 1 month ago

I build from soure, and add the support for cogvlm.