A high-throughput and memory-efficient inference and serving engine for LLMs
26.77k
stars
3.92k
forks
source link
[Bug]: : ERROR 07-31 11:57:33 async_llm_engine.py:658] Engine iteration timed out. This should never happen! #6969
Open
lucasjinreal opened 1 month ago
Your current environment
sampling_params = SamplingParams(temperature=0.2, max_tokens=1024, stop=["<|im_start|>", "<|im_end|>"], skip_special_tokens=False)