triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.99k stars 1.44k forks source link

Metrics from Metric port being mixed when both Triton Model Analyzer and Triton Inference Server being started #5372

Open ApoorveK opened 1 year ago

ApoorveK commented 1 year ago

Description Currently, if both containers for model analyzer and Triton inference server are being deployed, so while collecting data from respective sources from their metrics endpoint port, This data is being mixed up as these data from both containers pops up while a query being executed on prometheus Expression browser, thus making no sense at all. this Issue continue to exist in Grafan dashboards.

A solution could have thought of, of using two different prometheus instances (one for Triton Inference Server and one for Model Analyzer) but prometheus service itself doesn't allow multiple instances of Prometheus (due to entry in their TSDB for record maintainence)

Triton Information Model Analyzer : Latest version Triton Inference Server : 22.10

Are you using the Triton container or did you build it yourself?

Triton Inference Server: Used following docker file with base image build from triton inference server's source repo

# FROM nvcr.io/nvidia/tritonserver:22.10-py3
FROM tritonserver:latest
RUN apt install python3-pip -y
RUN pip install tensorflow==2.7.0
RUN pip install transformers==2.11.0
RUN pip install tritonclient
RUN pip install tritonclient[all]

Model-Analyzer: Build from triton model analyzer's source repo

Starting command for Triton Inference Engine :-

docker run --shm-size=2g --rm --name trial -p8003:8000 -p8004:8001 -p8005:8002 \
-v $(pwd)/model_repo_triton:/models customtritonimage:latest tritonserver \
--model-repository=/models --model-control-mode=poll

Starting Model Analyzer :-

docker run -it \
--name model-analyzer-trial \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd):/workspace \
--net=host model-analyzer:latest

docker exec model-analyzer-trial /bin/sh -c "cd /workspace; model-analyzer -v profile -f sweep_ensemble.yaml; exit"

Also, you can use any models for this use-case.

Now using following Prometheus YML file for prometheus binary :-

# my global config
global:
  scrape_interval: 5s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 5s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "triton_Inference_Server"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:8002"] # for default model analyzer port 
      - targets: ["localhost:8005"] # for changed metric port for triton inference server

And starting prometheus with following command :-

./prometheus --config.file=prometheus.yml --web.listen-address="localhost:9090"

Expected behavior There should be choice that in single prometheus data source which will created from this prometheus yml file, we can distinctively have data for 'Model analyzer' and 'Triton inference server' as well.

kthui commented 1 year ago

cc @GuanLuo @tanmayv25