konveyor / kai

Konveyor AI - static code analysis driven migration to new targets via Generative AI

Apache License 2.0

25 stars 32 forks source link

Capture number of tokens in a request and response when possible #373

Open jwmatthews opened 2 months ago

jwmatthews commented 2 months ago

We've run into a few situations where it would benefit us if we had a better view of the number of tokens consumed in a request and response.

Let's augment the data we are capturing for tracing and add in any extra info we may get back from the LLM via 'response_metadata'.

Current understanding is that for some models, the response includes metadata that breaks out the number of tokens used in the request and the response.

jwmatthews commented 2 months ago

@devjpt23 has begun to work on this issue. I wasn't yet able to formally assign him to this issue.

Looks like I can only assign issues to folks in the Konveyor Org, so formed a new team of 'Collaborators' and invited @devjpt23 to that so he can be assigned future issues.

jwmatthews commented 1 month ago

375 adds the ability to log token request/response usage on successful calls for some models that send back a 'token_usage' in response metadata.

We would like to extend the capability beyond what #375 offers.

Pre-compute a guess at the tokens consumed in a prompt, prior to sending and log it.
On a failure response, check if there is any response metadata on token usage we can find
Explore other providers that are not showing token usage as per #375, Amazon Bedrock is one provider which didn't log token data as per #375