Open sudanl opened 1 week ago
cc @heheda12345
Will huggingface stop with the same prompt & image?
I run the example scripts on https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct#use-with-transformers. It works well.
When I put the same example image & prompt into https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py, It can't stop.
To check whether it is caused by stop_token_ids
, you can try to check whether the output logprobs here is correct.
https://github.com/vllm-project/vllm/blob/c5d7fb9ddc16d9eb68f1018cfb384faf3be301be/vllm/model_executor/models/mllama.py#L1078
Hi, I just updated vllm to the latest code version, and the same problem still occurs. Could you describe in more detail how to check the specific problem of stop_token_ids
by outputting the logprobs?
Hope these tips can be helpful for you
SamplingParams
are the same.
Your current environment
The output of `python collect_env.py`
```text Your output of `python collect_env.py` here ```Model Input Dumps
No response
š Describe the bug
Derectly runing: https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py, and set
max_tokens
in the code as 1024The model output will continue to the maximum tokens and will not stop early:
Is that because the
stop_token_ids
setting is incorrect?Before submitting a new issue...