vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.9k stars 4.12k forks source link

[feat] vLLM generation deterministic option/flag #2910

Open PeterSH6 opened 7 months ago

PeterSH6 commented 7 months ago

Hi vllm maintainers,

Thanks for the awesome project!

I'm wondering is there a deterministic option/flag to let the model generate identical results in different runs with the same prompts? (Also support random and beam search sampler, not only greedy sampler) Does it enough to get deterministic behavior by setting the following random state? I'm not sure what other factors will violate the determinism.

torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
torch.use_deterministic_algorithms(True)

CC: @WoosukKwon @zhuohan123 @Yard1

simon-mo commented 7 months ago

This is already supported through sampling params and OpenAI compatible API as of v0.3.2

simon-mo commented 7 months ago

Closed by #2514

PeterSH6 commented 7 months ago

It seems that the latest version support per-request seed. But it may still have indeterminacy

When using torch.use_deterministic_algorithms(True), the pytorch will get error as the cumsum() operation in https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/sampler.py#L205, does not have deterministic cuda kernel.

Therefore, the current version may not be 'really deterministic', is it possible to bypass this operation? @simon-mo

simon-mo commented 7 months ago

Good point. It seems it is still unresolved on the PyTorch side: https://github.com/pytorch/pytorch/issues/75240