Open odrobnik opened 6 months ago
I second this one. This is quite missing.
Not sure if we should use the native Ollama API rather than the OpenAI compatibility layer, as it seems to have the prompt_eval_count
(input_tokens
) and eval_count
(output_tokens
) in the final response.
I am okay with creating a custom adapter for Ollama with its native API, but not sure if that aligns with Ollama's focus or direction.
Can this issue now be closed since this has been merged? https://github.com/lobehub/lobe-chat/issues/3179
In streaming mode the OpenAI chat completion has a new parameter to include Usage information after the Chunks. You just add a
{ "include_usage": true }
to the request.Then the final chunks will look like this:
The final chunk contains no choices, but a
usage
:This usage is over all the generations from this stream.