Description

We are encountering an issue with the Triton Inference Server's in-process Python API where the metrics port (default: 8002) does not open. This results in a 'connection refused' error when attempting to access localhost:8002/metrics. We would appreciate guidance on how to properly enable the metrics port using the in-process Python API.

Triton Version

2.42.0

Steps to reproduce the behavior

Initialize the Triton Inference Server using the in-process Python API with the following code snippet:
```
import tritonserver
```

Initialize and start the Triton server

self._triton_server = tritonserver.Server( model_repository=model_repository, model_control_mode=tritonserver.ModelControlMode.EXPLICIT, ) self._triton_server.start(wait_until_ready=True)

2. Attempt to access the metrics endpoint at localhost:8002/metrics.
3. Observe the 'connection refused' error.

**Expected behavior**

The metrics port should be accessible and provide metrics data when the Triton Inference Server is started using the in-process Python API.

**Temporary Workaround**

As a temporary solution, we have started an HTTP server manually to serve the metrics endpoint:

import tritonserver import uvicorn import threading from fastapi import FastAPI from starlette.responses import Response

Initialize and start the Triton server

self._triton_server = tritonserver.Server( model_repository=['/mount/data/models'], model_control_mode=tritonserver.ModelControlMode.EXPLICIT ) self._triton_server.start(wait_until_ready=True) self._triton_server.load('clip') self._model = self._triton_server.model('clip')

Set up a FastAPI application to serve metrics

self.app = FastAPI()

@self.app.get("/metrics") def get_metrics(): output = self._triton_server.metrics() return Response(output, media_type="text/plain")

Run the FastAPI app in a separate thread

def run(): uvicorn.run(self.app, host="0.0.0.0", port=8002)

self.server = threading.Thread(target=run) self.server.start()



We would prefer to use the built-in functionality for serving metrics and avoid maintaining this workaround. Any suggestions or solutions would be greatly appreciated.

triton-inference-server / server

Metrics Port Not Opening with Triton Inference Server's In-Process Python API #7197

Initialize and start the Triton server

Initialize and start the Triton server

Set up a FastAPI application to serve metrics

Run the FastAPI app in a separate thread