huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.73k stars 1.01k forks source link

Serverless inference API endpoints fails to return logprobs via chat completions #1852

Closed ggbetz closed 3 months ago

ggbetz commented 4 months ago

System Info

serverless inference endpoints

Information

Tasks

Reproduction

Querying mistral with the following snippet via Messages API ...

from openai import OpenAI

API_URL = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2/v1/"
#API_URL = "https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/"

client = OpenAI(
    base_url=API_URL,
    api_key=userdata.get('HF_TOKEN')
)

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "user", "content": "What is deep learning?"}
    ],
    logprobs=True,
    top_logprobs=5,
    max_tokens=1,
)

print(chat_completion.choices[0].logprobs)

yields:

>>> ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token=' Deep', bytes=None, logprob=-0.00023388863, top_logprobs=[TopLogprob(token='Deep', bytes=None, logprob=-0.00023388863), TopLogprob(token='Deep', bytes=None, logprob=-8.375), TopLogprob(token='deep', bytes=None, logprob=-12.671875), TopLogprob(token='deep', bytes=None, logprob=-15.421875), TopLogprob(token='deeply', bytes=None, logprob=-19.265625)])])

... which is fine. However, changing to zephyr endpoint (uncomment line in snippet) yields:

>>> ChoiceLogprobs(content=[])

Expected behavior

Both serverless inference endpoints should return logprobs.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.