janhq / cortex.llamacpp

cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
GNU Affero General Public License v3.0
22 stars 3 forks source link

feat: [support stream_option for OpenAI API compatible] #265

Closed nguyenhoangthuan99 closed 1 week ago

nguyenhoangthuan99 commented 3 weeks ago

Problem

reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

related issue: https://github.com/janhq/internal/issues/160

gabrielle-ong commented 1 week ago

✅ QA: on cortex.llama-cpp v0.1.37-01.11.24 API request: v1/chat/completions

"stream_options": {
        "include_usage": false | true
},

include_usage = false:

Image

include_usage = true: additional chunk before [DONE] showing token usage statistics

Image

Image