Log input tokens, output tokens and token details

simonw / llm

Access large language models from the command-line

https://llm.datasette.io

Apache License 2.0

4.85k stars 272 forks source link

Log input tokens, output tokens and token details #642

Closed simonw closed 5 days ago

simonw commented 6 days ago

Refs:

TODO:

[x] Log input/output/details to new columns on responses table.
[x] llm prompt -u/--usage option
[x] Add token usage information to markdown llm logs output
[x] Implement this in at least one other plugin to check it makes sense
[x] Update plugin docs to explain response.set_usage()
~Document how to use this in the Python API (I'll need this myself for the datasette-llm package)~ I need to document Response generally, will do this in a new issue.

simonw commented 6 days ago

I'm going to omit the token information from llm logs markdown unless the user specifies -u/--usage (I'll keep it on the JSON by default though).

simonw commented 5 days ago

End output of llm logs -u now:

...

Example Command:

If you have a SQLite database named texts.db with a table documents containing a text column content, the command would look like this:
llm embed-multi my-texts \
  --sql "SELECT id, content FROM documents" \
  --model ada-002 \
  --store
Replace ada-002 with the embedding model that you wish to use for processing the text. Adjust the SQL query to fit your actual table structure.

This will process all entries in the documents table and store the embeddings in the my-texts collection.

Token usage:

30,791 input, 30,791 output, {"prompt_tokens_details": {"cached_tokens": 30592}}

simonw commented 5 days ago

This diff to llm-claude-3 logged token counts correctly:

diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..281084e 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -240,16 +240,23 @@ class ClaudeMessages(_Shared, llm.Model):
     def execute(self, prompt, stream, response, conversation):
         client = Anthropic(api_key=self.get_key())
         kwargs = self.build_kwargs(prompt, conversation)
+        usage = None
         if stream:
             with client.messages.stream(**kwargs) as stream:
                 for text in stream.text_stream:
                     yield text
                 # This records usage and other data:
                 response.response_json = stream.get_final_message().model_dump()
+                usage = response.response_json.pop("usage")
         else:
             completion = client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+            usage = response.response_json.pop("usage")
+        if usage:
+            response.set_usage(
+                input=usage.get("input_tokens"), output=usage.get("output_tokens")
+            )

 class ClaudeMessagesLong(ClaudeMessages):

simonw commented 5 days ago

Better Claude diff:

diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..0a6e236 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -231,6 +231,13 @@ class _Shared:
             kwargs["extra_headers"] = self.extra_headers
         return kwargs

+    def set_usage(self, response):
+        usage = response.response_json.pop("usage")
+        if usage:
+            response.set_usage(
+                input=usage.get("input_tokens"), output=usage.get("output_tokens")
+            )
+
     def __str__(self):
         return "Anthropic Messages: {}".format(self.model_id)

@@ -250,6 +257,7 @@ class ClaudeMessages(_Shared, llm.Model):
             completion = client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+        self.set_usage(response)

 class ClaudeMessagesLong(ClaudeMessages):
@@ -270,6 +278,7 @@ class AsyncClaudeMessages(_Shared, llm.AsyncModel):
             completion = await client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+        self.set_usage(response)

 class AsyncClaudeMessagesLong(AsyncClaudeMessages):

simonw / llm

Log input tokens, output tokens and token details #642

610

Example Command:

Token usage: