Error when prompt_logprobs + enable_prefix_caching

bgyoon commented 7 months ago

  File "vllm/model_executor/layers/sampler.py", line 98, in forward
    logits.div_(sampling_tensors.temperatures.unsqueeze_(dim=1))
RuntimeError: The size of tensor a (5) must match the size of tensor b (117) at non-singleton dimension 0

I think the problem comes from that logits up to 112(16*7blocks) is prefix-cached, and only the last 5 input tokens are computed. To return the prompt logprobs, the sampler is looking for all 117 logits while only recently calculated 5 logits are returned there. It seems the cached 112 logits need to be returned as well. I don't know how...

DouHappy commented 7 months ago

same error.

thefirebanks commented 6 months ago

Same error here!

vllm-project / vllm

Error when prompt_logprobs + enable_prefix_caching #3251