kubeagi / arcadia

A diverse, simple, and secure one-stop LLMOps platform
http://www.kubeagi.com/
Apache License 2.0
63 stars 20 forks source link

support show stream token in API #1017

Open Abirdcfly opened 2 months ago

Abirdcfly commented 2 months ago

Now the fastchat backend can't return the number of tokens consumed when in stream mode, and our API gateway needs this number for billing or metric.

Therefore, we intend to provide an API to count the number of generated texts.

Initial design is as follows:

  1. arcadia back-end will add one header key in resp X-Request-ID, a unique marker for each req
  2. arcadia provides an unauthorized and unauthenticated GET API /sum-tokens to return to each requestID corresponds to the total number of characters in the resp. An example request is:
    GET /sum-tokens?id=xxxx
    {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
    }
Abirdcfly commented 2 months ago

cc @wojesen @nkwangleiGIT @bjwswang we need more discussion here.