lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.27k stars 4.46k forks source link

Feature: Add OpenAI Usage stats when using streaming with the Chat Completions API or Completions API #3360

Open douxiaofeng99 opened 3 months ago

douxiaofeng99 commented 3 months ago

from OpenAI 1.26.0, Usage stats now available when using streaming with the Chat Completions API or Completions API

https://community.openai.com/t/usage-stats-now-available-when-using-streaming-with-the-chat-completions-api-or-completions-api/738156

FastChat should also support this stat.

brandonbiggs commented 3 months ago

For what it's worth, Fastchat already supports some usage stats while streaming: Example from the fastchat openai api: "usage": {"prompt_tokens": 591, "total_tokens": 674, "completion_tokens": 83}

douxiaofeng99 commented 3 months ago

Thanks for reply. Could you tell me which version support usage stats. Then, you can close this issue!

brandonbiggs commented 3 months ago

I'm using the newest version, but I think it's been available for a little bit. https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/openai_api_server.py#L743

douxiaofeng99 commented 2 months ago

I reviewed the code and found its' for @app.post("/v1/embeddings", dependencies=[Depends(check_api_key)]) @app.post("/v1/engines/{model_name}/embeddings", dependencies=[Depends(check_api_key)]). In the above section, I mean Chat Completions API or Completions API. We also download the latest package and sure that there is no usage stat when use streaming mode.

douxiaofeng99 commented 2 months ago

@brandonbiggs any progress?

brandonbiggs commented 2 months ago

Sorry, any progress on what? I get stats when calling mine. Not sure why you don't.

douxiaofeng99 commented 2 months ago

" I get stats when calling mine. Not sure why you don't." Do you use streaming mode(SSE)? If so, i will test again

douxiaofeng99 commented 2 months ago

any one can help? in the https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/openai_api_server.py, the /v1/chat/completions method without stat response when use streaming.

image image
douxiaofeng99 commented 1 month ago

@brandonbiggs @tmm1 Hi, I study the fastchat code carefully, and i am sure the fastchat do not include usage state in streaming mode. you can add the option by refereeing https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_completion.py. Thanks

image
douxiaofeng99 commented 1 month ago

Very disappointed that no one responded to the requested features.