vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.84k stars 4.69k forks source link

`early_stopping` potentially not working via api request #2938

Closed Maxusmusti closed 8 months ago

Maxusmusti commented 9 months ago

While using v0.3.1, early_stopping will not toggle to True due to an omission in the protocol definition (see below comments). I am prompting like this:

headers = {
    'Content-Type': 'application/json',
}

json_data = {
    'model': '/mnt/models/',
    'prompt': ['Something', 'Something'],
    'max_tokens': 128,
    'use_beam_search': True,
    'best_of': 4,
    'temperature': 0,
    'early_stopping': True,
    ###'min_tokens': 30
    ###'stop_token_ids': [50256],
}

response = requests.post(f'{api_server}/v1/completions', headers=headers, json=json_data, verify=False)
print(json.loads(response.text))

and this is what I get on server side:

INFO 02-20 21:36:30 async_llm_engine.py:433] Received request cmpl-0a8493bd1a77481fb2396bb42c6bd9af-1: prompt: None, prefix_pos: None,sampling_params: SamplingParams(n=1, best_of=4, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=True, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: [5195, 407, 30], lora_request: None.

Everything else I set is there, but the early stopping isn't, even though none of my other options should be incompatible: https://github.com/vllm-project/vllm/blob/264017a2bf030f060ebad91eb9be9b4e0033edb9/vllm/sampling_params.py#L106

Maxusmusti commented 9 months ago

Additionally, I can set the value of early_stopping to any random value, and it will throw no error, so it seems like it isn't even being passed to the SamplingParameters

Maxusmusti commented 9 months ago

Upon further inspection, it looks like it has just been left out of the completion/chat-completion apis, possibly an oversight?: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/protocol.py#L56

Maxusmusti commented 9 months ago

Opened a PR with a potential quick 4-line fix, let me know if it looks like anything is missing! https://github.com/vllm-project/vllm/pull/2939

simon-mo commented 9 months ago

It's not an oversight originally because it is not part of the official API https://platform.openai.com/docs/api-reference/chat/create

But it does seem needed. Thank you for your PR

njhill commented 9 months ago

use_beam_search and length_penalty also aren't part of the official API, it goes along with those I guess

hmellor commented 8 months ago

Closed by #2939