Open thisissum opened 5 months ago
Your current environment
version: v0.4.1 device: A800*2 model: qwen-14b-chat
🐛 Describe the bug
I added a print statement in the following code.
# vllm.model_executor.layers.sampler.py # line 53-58 assert logits is not None _, vocab_size = logits.shape print(torch.mean(logits).cpu()) # I added my code here # Apply min_tokens penalty which sets stop tokens to -inf if min_tokens # have not been generated yet logits = _apply_min_tokens_penalty(logits, sampling_metadata)
Even when using the same decoding parameters, the output logits still changes when I increase the tensor-parallel-size from 1 to 2.
I use "seed=1024" in generation
How much is the difference and can you show a repro script?
Your current environment
version: v0.4.1 device: A800*2 model: qwen-14b-chat
🐛 Describe the bug
I added a print statement in the following code.
Even when using the same decoding parameters, the output logits still changes when I increase the tensor-parallel-size from 1 to 2.