Open codearranger opened 7 months ago
I would like to work on this one. I have worked on several Prometheus metrics integrations on Go apps before.
+1
I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters.
By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base.
Let me know if I can help with the reviews.
I can help with cardinality exploration, sizing of labels, reviews, but I haven't opened the full code base to check where we can add the metric counters.
By default the prometheus go module exports some system metrics like GC, Mem, Routines, yet that is just the app base.
Let me know if I can help with the reviews.
Great, I already started by adding the metrics endpoint and trying to add a few custom metrics. I will share what metrics I'm trying to add initially and how it would generally look like.
I added metrics endpoint with custom metrics for request counts.
example:
# curl http://127.0.0.1:11434/metrics | grep -i ollama
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6664 0 6664 0 0 519k 0 --:--:-- --:--:-- --:--:-- 542k
# HELP ollama_model_list_requests_total The total number of model list requets that have been attempted.
# TYPE ollama_model_list_requests_total counter
ollama_model_list_requests_total{action="list",status="OK",status_code="200"} 1
# HELP ollama_model_pull_requests_total The total number of model pulls that have been attempted.
# TYPE ollama_model_pull_requests_total counter
ollama_model_pull_requests_total{action="pull",status="OK",status_code="200"} 1
# HELP ollama_model_requests_total The total number of requests on all endpoints.
# TYPE ollama_model_requests_total counter
ollama_model_requests_total{action="all",status="OK",status_code="200"} 6
I will not add more in the first PR to make it simpler.
This is nice!
So I've got a lot of thoughts about this. I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues.
There's an open PR on adding semantic conventions for LLM applications, but its focus is more on the API side of things, and I think ollama could provide a pretty good use case for standardizing telemetry data for the internal nitty gritty of LLMS; think perplexity, predicted token loss, etc.
Hey folks, I am the maintainer of the OpenLIT project
We built OpenTelemetry tracing and metrics for the Python Ollama client (Follows the OTel Semantic conventions for LLMs) in the OpenLIT SDK. This basically would give tracing and metrics for API side of things like prompts, response, tokens, some request and response metadata.
You can check it out here https://github.com/openlit/openlit, Lemme know if you have any thoughts on this as I feel this is closely related to this open issue aswell.
We are trying to evaluate on how using the same library we could extract GPU metrics (possibly using nvidias-smi), If anyone has thoughts that would love to see
@patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter.
So I've got a lot of thoughts about this. I think metrics and traces need to be added, but it would be nice to add OpenTelemetry instead of prometheus clients, this would also have the added benefit of Traces which would be invaluable for debugging issues.
If metrics get integrated I think one should definitely go with the OpenTelemetry and not with a product specific client such a Prometheus. I use Prometheus as well, but I can just hook it up with OTLP. Tracing is also an important aspect.
As nice as OpenLIT seems, having the data come from the inference server in a distributed application is way more beneficial in my use case and is keeping me from using Ollama.
queue length would be an important metric to me.
@patcher9 you can export different GPU metrics using the NVIDIA DCGM exporter.
Thanks @nsankar, We were looking for something that is OpenTelemetry native (DCGM Exporter metrics are prometheus style). So we did build an OpenTelemetry-variant of the DCGM exporter ourselves.
https://github.com/openlit/openlit/tree/main/otel-gpu-collector
It would be interesting to have this metrics for production deployment.
Thoughts on the approach of defining and implementing a basic set of metrics first and then splitting additional metrics / tracing into other issues? Seems like that might be a good way to move this one forward.
+1 to: https://github.com/ollama/ollama/issues/3144#issuecomment-2041644151
We would appreciate prometheus /metrics, too
It would be nice of ollama had a /metrics endpoint for collecting metrics for prometheus or other monitoring tools.
https://prometheus.io/docs/guides/go-application/
Some metrics to include might be, GPU utilization, memory utilization, CPU utilzation, layers used, request counts, etc.