huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.87k stars 1.05k forks source link

'details' in /v1/chat/completions endpoint missing #1656

Closed daz-williams closed 5 months ago

daz-williams commented 6 months ago

System Info

'details' in /v1/chat/completions endpoint missing

This works:

stream_url ="localhost:8000/generate_stream"

payload = {
    "inputs": prompt,
    "parameters": {
        "stream": True,
        "details": True,
    },
}

and correctly returns 'details' in the final chunk:

data:{"index":49,"token":{"id":32000,"text":"<|im_end|>","logprob":-0.91845703,"special":true},"generated_text":"Example answer","details":{"finish_reason":"eos_token","generated_tokens":49,"seed":3169457846579174189}}

but this endpoint does not:

stream_url = "localhost:8000/v1/chat/completions"

payload = {
    "messages": prompt,
    "details": True,   # neither of these work
    "parameters": {
        "details": True,   # neither of these work
    },
}

Reviewing server.rs at line 675, I can see details is set to True by default. So in theory it should be included?

    // build the request passing some parameters
    let generate_request = GenerateRequest {
        inputs: inputs.to_string(),
        parameters: GenerateParameters {
            best_of: None,
            temperature: req.temperature,
            repetition_penalty,
            frequency_penalty: req.frequency_penalty,
            top_k: None,
            top_p: req.top_p,
            typical_p: None,
            do_sample: true,
            max_new_tokens,
            return_full_text: None,
            stop: Vec::new(),
            truncate: None,
            watermark: false,
            details: true,
            decoder_input_details: !stream,
            seed,
            top_n_tokens: None,
            grammar: tool_grammar.clone(),
        },
    };

TGI Version = "1.4.3" Via official docker image

It's an additional separate field, so would not interfere with the OpenAI Standard Format.

Information

Tasks

Reproduction

stream_url = "localhost:8000/v1/chat/completions"

payload = { "messages": prompt, "details": True, # neither of these work "parameters": { "details": True, # neither of these work }, }

Expected behavior

Expected details && generated_tokens in the response:

data:{"index":49,"token":{"id":32000,"text":"<|im_end|>","logprob":-0.91845703,"special":true},"generated_text":"Example answer","details":{"finish_reason":"eos_token","generated_tokens":49,"seed":3169457846579174189}}

drbh commented 6 months ago

Hi @daz-williams thank you for using TGI and opening this issue, however this is the intended functionality since details are not a concept in the chat api.

The /v1/chat/completions endpoint returns a ChatCompletionChunk or ChatCompletion type response based on if you're streaming.

The ChatCompletion response includes choices.logprobs, choices.finish_reason and usage, and the ChatCompletionChunk includes a finish_reason and logprobs which all use information from details.

Is there a specific data needed from the chat endpoint?

drbh commented 5 months ago

closing this issue as this is the expected functionality (described above)