vLLM Metrics for Prometheus

opea-project / GenAIComps

GenAI components at micro-service level; GenAI service composer to create mega-service

Apache License 2.0

57 stars 120 forks source link

vLLM Metrics for Prometheus #431

Open nacartwright opened 2 months ago

nacartwright commented 2 months ago

The vLLM metrics are not showing correctly. It should show time to first token, number of running requests, cpu/gpu cache...etc as shown here:

https://github.com/vllm-project/vllm/blob/main/vllm/engine/metrics.py

devpramod commented 2 months ago

@kevinintel The metrics explorer in prometheus does not show any vLLM related metrics.

lvliang-intel commented 2 months ago

You should connect to vLLM serving endpoint, not LLM microservice endpoint.

kevinintel commented 2 months ago

please modify endpoint in prometheus.yml ex:
static_configs:

targets: ["llm-dependency-svc.default.svc.cluster.local:9009"]

devpramod commented 1 month ago

@nacartwright I was able to successfully test on a local machine with the following config for prometheus

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "vllm"
    static_configs:
      - targets: ["external_ip:port"]