Open dan-homebrew opened 1 day ago
@nguyenhoangthuan99 - Please transfer https://github.com/janhq/internal/issues/160 to this issue (can be public)
API reference: https://platform.openai.com/docs/api-reference/chat/create
Missing supported fields from /v1/chat/completions
API:
boolean
or null Optional Defaults to false Whether or not to store the output of this chat completion request for use in our model distillation or evals products. To support this, we should come up with an architecture to save and store output of chat completion requests of user. (e.g. MinIO for storage and postgres for DB).object
or null Optional Developer-defined tags and values used for filtering completions in the dashboard.
This also require some logics to save result to DB then user can query later. map
Optional
Defaults to null. Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. -> need to confirm llamacpp support this or not, but this might be nice to have feature. Issue: https://github.com/janhq/cortex.llamacpp/issues/263boolean
or null Optional Defaults to false. Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This feature is partially supported, need up update cortex.llamacpp to return logprob when use stream/ non stream mode. Issue: https://github.com/janhq/cortex.llamacpp/issues/262The result should look like this:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1702685778,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
"logprobs": {
"content": [
{
"token": "Hello",
"logprob": -0.31725305,
"bytes": [72, 101, 108, 108, 111],
"top_logprobs": [
{
"token": "Hello",
"logprob": -0.31725305,
"bytes": [72, 101, 108, 108, 111]
},
{
"token": "Hi",
"logprob": -1.3190403,
"bytes": [72, 105]
}
]
},
{
"token": "!",
"logprob": -0.02380986,
"bytes": [
33
],
"top_logprobs": [
{
"token": "!",
"logprob": -0.02380986,
"bytes": [33]
},
{
"token": " there",
"logprob": -3.787621,
"bytes": [32, 116, 104, 101, 114, 101]
}
]
},
{
"token": " How",
"logprob": -0.000054669687,
"bytes": [32, 72, 111, 119],
"top_logprobs": [
{
"token": " How",
"logprob": -0.000054669687,
"bytes": [32, 72, 111, 119]
},
{
"token": "<|end|>",
"logprob": -10.953937,
"bytes": null
}
]
},
{
"token": " can",
"logprob": -0.015801601,
"bytes": [32, 99, 97, 110],
"top_logprobs": [
{
"token": " can",
"logprob": -0.015801601,
"bytes": [32, 99, 97, 110]
},
{
"token": " may",
"logprob": -4.161023,
"bytes": [32, 109, 97, 121]
}
]
},
{
"token": " I",
"logprob": -3.7697225e-6,
"bytes": [
32,
73
],
"top_logprobs": [
{
"token": " I",
"logprob": -3.7697225e-6,
"bytes": [32, 73]
},
{
"token": " assist",
"logprob": -13.596657,
"bytes": [32, 97, 115, 115, 105, 115, 116]
}
]
},
{
"token": " assist",
"logprob": -0.04571125,
"bytes": [32, 97, 115, 115, 105, 115, 116],
"top_logprobs": [
{
"token": " assist",
"logprob": -0.04571125,
"bytes": [32, 97, 115, 115, 105, 115, 116]
},
{
"token": " help",
"logprob": -3.1089056,
"bytes": [32, 104, 101, 108, 112]
}
]
},
{
"token": " you",
"logprob": -5.4385737e-6,
"bytes": [32, 121, 111, 117],
"top_logprobs": [
{
"token": " you",
"logprob": -5.4385737e-6,
"bytes": [32, 121, 111, 117]
},
{
"token": " today",
"logprob": -12.807695,
"bytes": [32, 116, 111, 100, 97, 121]
}
]
},
{
"token": " today",
"logprob": -0.0040071653,
"bytes": [32, 116, 111, 100, 97, 121],
"top_logprobs": [
{
"token": " today",
"logprob": -0.0040071653,
"bytes": [32, 116, 111, 100, 97, 121]
},
{
"token": "?",
"logprob": -5.5247097,
"bytes": [63]
}
]
},
{
"token": "?",
"logprob": -0.0008108172,
"bytes": [63],
"top_logprobs": [
{
"token": "?",
"logprob": -0.0008108172,
"bytes": [63]
},
{
"token": "?\n",
"logprob": -7.184561,
"bytes": [63, 10]
}
]
}
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18,
"completion_tokens_details": {
"reasoning_tokens": 0
}
},
"system_fingerprint": null
}
n integer
or null Optional Defaults to 1. How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs. -> need to check if llama.cpp support this option. Issue: https://github.com/janhq/cortex.llamacpp/issues/264
service_tier string
or null Optional Defaults to auto.
Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:
If set to 'auto', and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
If set to 'auto', and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
If set to 'default', the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
When not set, the default behavior is 'auto'.
When this parameter is set, the response body will include the service_tier utilized.
stream_options object
or null Optional Defaults to null Options for streaming response. Only set this when you set stream: true. -> need to update cortex.llamacpp to support this. Issue: https://github.com/janhq/cortex.llamacpp/issues/265
modalities and audio: reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-modalities. We need a roadmap to support multimodalities for audio.
user : reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-user.
The following fields cannot be supported directly with cortex.cpp and need road map for it in the enterprise version:
store boolean
or null Optional Defaults to false Whether or not to store the output of this chat completion request for use in our model distillation or evals products. To support this, we should come up with an architecture to save and store output of chat completion requests of user. (e.g. MinIO for storage and postgres for DB).
metadata object
or null Optional Developer-defined tags and values used for filtering completions in the dashboard.
This also require some logics to save result to DB then user can query later.
service_tier string
or null Optional Defaults to auto.
Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:
If set to 'auto', and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
If set to 'auto', and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
If set to 'default', the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
When not set, the default behavior is 'auto'.
When this parameter is set, the response body will include the service_tier utilized.
modalities and audio: reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-modalities. We need a roadmap to support multimodalities for audio.
user : reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-user.
[x] Linked documentation to this issue in this PR #1589
Goal
/chat/completions
should have parameters similar to OpenAIplanning:
roadmap issues if it not supported yetTasklist