ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
https://ollama.com
MIT License
96.72k stars 7.68k forks source link

add /metrics endpoint #3144

Open codearranger opened 7 months ago

codearranger commented 7 months ago

It would be nice of ollama had a /metrics endpoint for collecting metrics for prometheus or other monitoring tools.

https://prometheus.io/docs/guides/go-application/

Some metrics to include might be, GPU utilization, memory utilization, CPU utilzation, layers used, request counts, etc.

amila-ku commented 7 months ago

I would like to work on this one. I have worked on several Prometheus metrics integrations on Go apps before.

aliirz commented 7 months ago

+1

yuliyantsvetkov commented 7 months ago

I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters.

By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base.

Let me know if I can help with the reviews.

amila-ku commented 7 months ago

I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters.

By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base.

Let me know if I can help with the reviews.

Great, I already started by adding the metrics endpoint and trying to add a few custom metrics. I will share what metrics I'm trying to add initially and how it would generally look like.

amila-ku commented 7 months ago

I added metrics endpoint with custom metrics for request counts.

example:

# curl http://127.0.0.1:11434/metrics | grep -i ollama
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6664    0  6664    0     0   519k      0 --:--:-- --:--:-- --:--:--  542k
# HELP ollama_model_list_requests_total The total number of model list requets that have been attempted.
# TYPE ollama_model_list_requests_total counter
ollama_model_list_requests_total{action="list",status="OK",status_code="200"} 1
# HELP ollama_model_pull_requests_total The total number of model pulls that have been attempted.
# TYPE ollama_model_pull_requests_total counter
ollama_model_pull_requests_total{action="pull",status="OK",status_code="200"} 1
# HELP ollama_model_requests_total The total number of requests on all endpoints.
# TYPE ollama_model_requests_total counter
ollama_model_requests_total{action="all",status="OK",status_code="200"} 6

I will not add more in the first PR to make it simpler.

sammcj commented 7 months ago

This is nice!

joshcarp commented 6 months ago

So I've got a lot of thoughts about this. I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues.

There's an open PR on adding semantic conventions for LLM applications, but its focus is more on the API side of things, and I think ollama could provide a pretty good use case for standardizing telemetry data for the internal nitty gritty of LLMS; think perplexity, predicted token loss, etc.

patcher9 commented 5 months ago

Hey folks, I am the maintainer of the OpenLIT project

We built OpenTelemetry tracing and metrics for the Python Ollama client (Follows the OTel Semantic conventions for LLMs) in the OpenLIT SDK. This basically would give tracing and metrics for API side of things like prompts, response, tokens, some request and response metadata.

You can check it out here https://github.com/openlit/openlit, Lemme know if you have any thoughts on this as I feel this is closely related to this open issue aswell.

We are trying to evaluate on how using the same library we could extract GPU metrics (possibly using nvidias-smi), If anyone has thoughts that would love to see

nsankar commented 5 months ago

@patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter.

Agash commented 5 months ago

So I've got a lot of thoughts about this. I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues.

If metrics get integrated I think one should definitely go with the OpenTelemetry and not with a product specific client such a Prometheus. I use Prometheus as well, but I can just hook it up with OTLP. Tracing is also an important aspect.

As nice as OpenLIT seems, having the data come from the inference server in a distributed application is way more beneficial in my use case and is keeping me from using Ollama.

kennethwolters commented 4 months ago

queue length would be an important metric to me.

patcher9 commented 4 months ago

@patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter.

Thanks @nsankar, We were looking for something that is OpenTelemetry native (DCGM Exporter metrics are prometheus style). So we did build an OpenTelemetry-variant of the DCGM exporter ourselves.

https://github.com/openlit/openlit/tree/main/otel-gpu-collector

maher-naija-pro commented 2 months ago

It would be interesting to have this metrics for production deployment.

807_514_1 25 815_565_1 25 988_589_1 25

nstogner commented 2 months ago

Thoughts on the approach of defining and implementing a basic set of metrics first and then splitting additional metrics / tracing into other issues? Seems like that might be a good way to move this one forward.

+1 to: https://github.com/ollama/ollama/issues/3144#issuecomment-2041644151

francescor commented 1 month ago

We would appreciate prometheus /metrics, too