janhq / cortex.llamacpp

cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
GNU Affero General Public License v3.0
22 stars 3 forks source link

support OpenAI compatible log probs #276

Closed nguyenhoangthuan99 closed 2 weeks ago

nguyenhoangthuan99 commented 2 weeks ago

Fix #262

Can test this feature by use openai lib

from openai import OpenAI
ENDPOINT = "http://localhost:3928/v1"
MODEL = "meta-llama3.1-8b-instruct"

client = OpenAI(
    # This is the default and can be omitted
    base_url=ENDPOINT,
    api_key="not-needed"
)
completion_payload = {
    "messages": [
        {"role": "user", "content": "Who won the world series in 2020?"}
    ]
}
response = client.chat.completions.create(
    top_p=0.9,
    temperature=0.6,
    model=MODEL,
    messages= completion_payload["messages"],
    top_logprobs=2,
    stream=False,
    logprobs=True
)
print(response)