Serverless inference API endpoints fails to return logprobs via chat completions

System Info

serverless inference endpoints

Information

[ ] Docker
[X] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Querying mistral with the following snippet via Messages API ...

from openai import OpenAI

API_URL = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2/v1/"
#API_URL = "https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/"

client = OpenAI(
    base_url=API_URL,
    api_key=userdata.get('HF_TOKEN')
)

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "user", "content": "What is deep learning?"}
    ],
    logprobs=True,
    top_logprobs=5,
    max_tokens=1,
)

print(chat_completion.choices[0].logprobs)

yields:

>>> ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token=' Deep', bytes=None, logprob=-0.00023388863, top_logprobs=[TopLogprob(token='Deep', bytes=None, logprob=-0.00023388863), TopLogprob(token='Deep', bytes=None, logprob=-8.375), TopLogprob(token='deep', bytes=None, logprob=-12.671875), TopLogprob(token='deep', bytes=None, logprob=-15.421875), TopLogprob(token='deeply', bytes=None, logprob=-19.265625)])])

... which is fine. However, changing to zephyr endpoint (uncomment line in snippet) yields:

>>> ChoiceLogprobs(content=[])

Expected behavior

Both serverless inference endpoints should return logprobs.

huggingface / text-generation-inference

Serverless inference API endpoints fails to return logprobs via chat completions #1852

System Info

Information

Tasks

Reproduction

Expected behavior