simonw / llm

Access large language models from the command-line
https://llm.datasette.io
Apache License 2.0
4.85k stars 272 forks source link

Log input tokens, output tokens and token details #642

Closed simonw closed 5 days ago

simonw commented 6 days ago

Refs:

TODO:

simonw commented 6 days ago

I'm going to omit the token information from llm logs markdown unless the user specifies -u/--usage (I'll keep it on the JSON by default though).

simonw commented 5 days ago

End output of llm logs -u now:

...

Example Command:

If you have a SQLite database named texts.db with a table documents containing a text column content, the command would look like this:

llm embed-multi my-texts \
  --sql "SELECT id, content FROM documents" \
  --model ada-002 \
  --store

Replace ada-002 with the embedding model that you wish to use for processing the text. Adjust the SQL query to fit your actual table structure.

This will process all entries in the documents table and store the embeddings in the my-texts collection.

Token usage:

30,791 input, 30,791 output, {"prompt_tokens_details": {"cached_tokens": 30592}}

simonw commented 5 days ago

This diff to llm-claude-3 logged token counts correctly:

diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..281084e 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -240,16 +240,23 @@ class ClaudeMessages(_Shared, llm.Model):
     def execute(self, prompt, stream, response, conversation):
         client = Anthropic(api_key=self.get_key())
         kwargs = self.build_kwargs(prompt, conversation)
+        usage = None
         if stream:
             with client.messages.stream(**kwargs) as stream:
                 for text in stream.text_stream:
                     yield text
                 # This records usage and other data:
                 response.response_json = stream.get_final_message().model_dump()
+                usage = response.response_json.pop("usage")
         else:
             completion = client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+            usage = response.response_json.pop("usage")
+        if usage:
+            response.set_usage(
+                input=usage.get("input_tokens"), output=usage.get("output_tokens")
+            )

 class ClaudeMessagesLong(ClaudeMessages):
simonw commented 5 days ago

Better Claude diff:

diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..0a6e236 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -231,6 +231,13 @@ class _Shared:
             kwargs["extra_headers"] = self.extra_headers
         return kwargs

+    def set_usage(self, response):
+        usage = response.response_json.pop("usage")
+        if usage:
+            response.set_usage(
+                input=usage.get("input_tokens"), output=usage.get("output_tokens")
+            )
+
     def __str__(self):
         return "Anthropic Messages: {}".format(self.model_id)

@@ -250,6 +257,7 @@ class ClaudeMessages(_Shared, llm.Model):
             completion = client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+        self.set_usage(response)

 class ClaudeMessagesLong(ClaudeMessages):
@@ -270,6 +278,7 @@ class AsyncClaudeMessages(_Shared, llm.AsyncModel):
             completion = await client.messages.create(**kwargs)
             yield completion.content[0].text
             response.response_json = completion.model_dump()
+        self.set_usage(response)

 class AsyncClaudeMessagesLong(AsyncClaudeMessages):