Open yucai opened 2 months ago
@nnshah1 We are using this API in ray data, very similar to what you did for ray serve. Like below: https://github.com/triton-inference-server/tutorials/blob/main/Triton_Inference_Server_Python_API/examples/rayserve/tritonserver_deployment.py
Description
We are encountering an issue with the Triton Inference Server's in-process Python API where the metrics port (default: 8002) does not open. This results in a 'connection refused' error when attempting to access localhost:8002/metrics. We would appreciate guidance on how to properly enable the metrics port using the in-process Python API.
Triton Version
2.42.0
Steps to reproduce the behavior
Initialize and start the Triton server
self._triton_server = tritonserver.Server( model_repository=model_repository, model_control_mode=tritonserver.ModelControlMode.EXPLICIT, ) self._triton_server.start(wait_until_ready=True)
import tritonserver import uvicorn import threading from fastapi import FastAPI from starlette.responses import Response
Initialize and start the Triton server
self._triton_server = tritonserver.Server( model_repository=['/mount/data/models'], model_control_mode=tritonserver.ModelControlMode.EXPLICIT ) self._triton_server.start(wait_until_ready=True) self._triton_server.load('clip') self._model = self._triton_server.model('clip')
Set up a FastAPI application to serve metrics
self.app = FastAPI()
@self.app.get("/metrics") def get_metrics(): output = self._triton_server.metrics() return Response(output, media_type="text/plain")
Run the FastAPI app in a separate thread
def run(): uvicorn.run(self.app, host="0.0.0.0", port=8002)
self.server = threading.Thread(target=run) self.server.start()