Closed slyt closed 2 months ago
Here's an example of parsing the metrics from the API
import requests
import json
data = {"model": "llama3", "prompt": "Why is the sky blue?", "stream":True}
response = requests.post("http://localhost:11434/api/generate", json=data)
response.raise_for_status()
metrics = ["total_duration", "load_duration", "prompt_eval_duration", "eval_count", "eval_duration"]
# convert response.content from bytes to json dict
content = response.content.decode("utf-8")
for line in content.split("\n"):
if not line.strip(): # ignore empty lines
continue
content_json = json.loads(line)
done = content_json.get("done", None)
print(line)
if done:
metrics_dict = {metric: content_json.get(metric, None) for metric in metrics}
print(metrics_dict)
This works for streaming and non-streaming responses.
Output is:
{'total_duration': 8037858768, 'load_duration': 207136, 'prompt_eval_duration': 235696000, 'eval_count': 464, 'eval_duration': 7801437000}
metrics are returned in the top level JSON object. here's the equivalent using the client
from ollama import chat
metrics = ['total_duration', 'load_duration', 'prompt_eval_duration', 'eval_count', 'eval_duration']
r = chat('llama3', [{'role': 'user', 'content': 'Hello, world!'}])
print({metric: r[metric] for metric in metrics})
# Prints:
# {'total_duration': 1325679458, 'load_duration': 538143625, 'prompt_eval_duration': 117545000, 'eval_count': 41, 'eval_duration': 667624000}
The Ollama API has a Metrics object that appears to be returned with both the ChatResponse and GenerateResponse.
I can't find a way to get these metrics using a client with ollama-python. Is it supported?
I would expect something like this to work:
currently, the above code outputs:
metrics: {}