vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.38k stars 4.6k forks source link

Does vllm support do_sample? #699

Open leiwen83 opened 1 year ago

leiwen83 commented 1 year ago

Hi,

For hugginface, it support various sample strategy: https://huggingface.co/docs/transformers/main/main_classes/text_generation

greedy decoding by calling greedy_search() if num_beams=1 and do_sample=False contrastive search by calling contrastive_search() if penalty_alpha>0. and top_k>1 multinomial sampling by calling sample() if num_beams=1 and do_sample=True beam-search decoding by calling beam_search() if num_beams>1 and do_sample=False beam-search multinomial sampling by calling beam_sample() if num_beams>1 and do_sample=True

Since vllm already support beam search with best_of parameter, I wonder how do we support the do_sample together with best_of for the beam sample strategy?

Thx

lucasjinreal commented 1 year ago

Same

BaiMoHan commented 1 year ago

I need this support🥺🥺🥺🥺🥺🥺

abdulvirta commented 1 year ago

same

samarthsarin commented 8 months ago

Any update on this feature? @WoosukKwon

latinostats commented 8 months ago

I am also interested on this feature.

Ksuriuri commented 7 months ago

Refer to vllm->model_executor->layers->sampler.py->_sample: While sampling_type == SamplingType.RANDOM, the implementation is the same as sample() in huggingface transformers.

So you just need to set the sampling type to "SamplingType.RANDOM", and refer to vllm->sampling_params.py->sampling_type: You should set use_beam_search = false and temperature > 1e-5 while you initializing SamplingParams.

lauhaide commented 2 months ago

Hello, what is the difference between SamplingType.RANDOM and SamplingType.RANDOM_SEED? As far as I followed the code it seems the same. I was not able to find where the 'seed' argument in SamplingParams is used to set any seed.

Any comments on this?

Thank you in advance!