Closed amogkam closed 8 months ago
The hidden states for the output token that starts to differ are significantly different between the HF vs. vLLM models.
which version/commit is this?
I can reproduce with 0.2.1.post1
Same here - does anyone know what the root cause of the issue is?
Closing this issue as stale as there has been no discussion in the past 3 months.
If you are still experiencing the issue you describe, feel free to re-open this issue.
"meta-llama/Llama-2-7b-hf"
, is returning different output vs. original HF model with a batch size of 3.This is running on a single A10G with tensor-parallel=1.
With a batch size of 1, the output is the same.
But if I use a batch_size of 3 of the same prompt, the outputs do not match for all the prompts in the batch