Support Metrics - Githubissues

slyt commented 2 months ago

The Ollama API has a Metrics object that appears to be returned with both the ChatResponse and GenerateResponse.

I can't find a way to get these metrics using a client with ollama-python. Is it supported?

I would expect something like this to work:

from ollama import Client
from ollama import Message

client = Client(host='http://localhost:11434')
client.pull("llama3")

system_message = Message(role="system", content="You are a helpful AI assistant")
user_message = Message(role="user", content=f"Hello there!")
response= client .chat(model="llama3", messages=[system_message, user_message]) 
metrics= response.get("metrics", {}) # This doesn't work
print(f"metrics: {metrics}")

currently, the above code outputs: metrics: {}

slyt commented 2 months ago

Here's an example of parsing the metrics from the API

import requests
import json

data = {"model": "llama3", "prompt": "Why is the sky blue?", "stream":True}
response = requests.post("http://localhost:11434/api/generate", json=data)
response.raise_for_status()

metrics = ["total_duration", "load_duration", "prompt_eval_duration", "eval_count", "eval_duration"]

# convert response.content from bytes to json dict
content = response.content.decode("utf-8")
for line in content.split("\n"):
    if not line.strip(): # ignore empty lines
        continue
    content_json = json.loads(line)
    done = content_json.get("done", None)
    print(line)
    if done:
        metrics_dict = {metric: content_json.get(metric, None) for metric in metrics}
        print(metrics_dict)

This works for streaming and non-streaming responses.

Output is: {'total_duration': 8037858768, 'load_duration': 207136, 'prompt_eval_duration': 235696000, 'eval_count': 464, 'eval_duration': 7801437000}

mxyng commented 2 months ago

metrics are returned in the top level JSON object. here's the equivalent using the client

from ollama import chat

metrics = ['total_duration', 'load_duration', 'prompt_eval_duration', 'eval_count', 'eval_duration']
r = chat('llama3', [{'role': 'user', 'content': 'Hello, world!'}])
print({metric: r[metric] for metric in metrics})
# Prints:
# {'total_duration': 1325679458, 'load_duration': 538143625, 'prompt_eval_duration': 117545000, 'eval_count': 41, 'eval_duration': 667624000}

ollama / ollama-python

Support Metrics #123