Closed simonw closed 5 days ago
I'm going to omit the token information from llm logs
markdown unless the user specifies -u/--usage
(I'll keep it on the JSON by default though).
End output of llm logs -u
now:
...
Example Command:
If you have a SQLite database named
texts.db
with a tabledocuments
containing a text columncontent
, the command would look like this:llm embed-multi my-texts \ --sql "SELECT id, content FROM documents" \ --model ada-002 \ --store
Replace
ada-002
with the embedding model that you wish to use for processing the text. Adjust the SQL query to fit your actual table structure.This will process all entries in the
documents
table and store the embeddings in themy-texts
collection.Token usage:
30,791 input, 30,791 output, {"prompt_tokens_details": {"cached_tokens": 30592}}
This diff to llm-claude-3
logged token counts correctly:
diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..281084e 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -240,16 +240,23 @@ class ClaudeMessages(_Shared, llm.Model):
def execute(self, prompt, stream, response, conversation):
client = Anthropic(api_key=self.get_key())
kwargs = self.build_kwargs(prompt, conversation)
+ usage = None
if stream:
with client.messages.stream(**kwargs) as stream:
for text in stream.text_stream:
yield text
# This records usage and other data:
response.response_json = stream.get_final_message().model_dump()
+ usage = response.response_json.pop("usage")
else:
completion = client.messages.create(**kwargs)
yield completion.content[0].text
response.response_json = completion.model_dump()
+ usage = response.response_json.pop("usage")
+ if usage:
+ response.set_usage(
+ input=usage.get("input_tokens"), output=usage.get("output_tokens")
+ )
class ClaudeMessagesLong(ClaudeMessages):
Better Claude diff:
diff --git a/llm_claude_3.py b/llm_claude_3.py
index a05b01b..0a6e236 100644
--- a/llm_claude_3.py
+++ b/llm_claude_3.py
@@ -231,6 +231,13 @@ class _Shared:
kwargs["extra_headers"] = self.extra_headers
return kwargs
+ def set_usage(self, response):
+ usage = response.response_json.pop("usage")
+ if usage:
+ response.set_usage(
+ input=usage.get("input_tokens"), output=usage.get("output_tokens")
+ )
+
def __str__(self):
return "Anthropic Messages: {}".format(self.model_id)
@@ -250,6 +257,7 @@ class ClaudeMessages(_Shared, llm.Model):
completion = client.messages.create(**kwargs)
yield completion.content[0].text
response.response_json = completion.model_dump()
+ self.set_usage(response)
class ClaudeMessagesLong(ClaudeMessages):
@@ -270,6 +278,7 @@ class AsyncClaudeMessages(_Shared, llm.AsyncModel):
completion = await client.messages.create(**kwargs)
yield completion.content[0].text
response.response_json = completion.model_dump()
+ self.set_usage(response)
class AsyncClaudeMessagesLong(AsyncClaudeMessages):
Refs:
610
TODO:
responses
table.llm prompt -u/--usage
optionllm logs
outputresponse.set_usage()
datasette-llm
package)~ I need to documentResponse
generally, will do this in a new issue.