Closed RocketRider closed 2 days ago
Maybe same underlying issue as https://github.com/vllm-project/vllm/issues/9770 or https://github.com/vllm-project/vllm/issues/9670
Not sure, maybe already fixed in main: https://github.com/vllm-project/vllm/pull/9549 => So I am closing this for now.
Your current environment
vllm-tgi: container_name: vllm-tgi image: vllm/vllm-openai:v0.6.3.post1 restart: always shm_size: "16gb" command: "--model /model --served-model-name mistral-large-123b --tensor-parallel-size 4 --port 8081 --api-key kitch_vllm --tokenizer_mode mistral --load_format safetensors --config_format mistral" ports:
Model Input Dumps
No response
🐛 Describe the bug
I tested Mistral-Large-2407 with v0.6.3post1 and got really strange results when using a long context. With small context it worked well. With v0.6.1post2 everything works as expected.
The output looks like this:
Before submitting a new issue...