triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.04k stars 1.44k forks source link

triton aiohttp client report "Timeout context manager should be used inside a task" error #6034

Open DequanZhu opened 1 year ago

DequanZhu commented 1 year ago

I have a flask app, app need to post a infer request to triton-server during flask request. I use the tritonclient.http.aio to create my client and to avoid creating the client repeatedly for each request, I have created a reusable client in the initialization function like below.

import asyncio
import tritonclient.http.aio as httpclient
from flask import Flask

class DemoModelClient:
    def __init__(
            self, url, trt_model_name, timeout
    ):
        self.url = url
        self.trt_model_name = trt_model_name
        self.triton_client = self._init_client()
        self.timeout = timeout

    def _init_client(self):
        triton_client = httpclient.InferenceServerClient(
            url=self.url, verbose=False
        )
        return triton_client

    def prepare_input_data(self, param_entity):
        # some prepare input data steps
        # .......
        # .......
        pass

    def infer(self, param_entity):
        input_data = self.prepare_input_data(param_entity)
        infer_result = await self.triton_client.infer(
            self.trt_model_name, input_data, outputs=outputs, timeout=self.timeout
        )
        return infer_result

model_client = DemoModelClient("xxxx", "xxxx", 1000)

def demo_service(param_entity):
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    task = loop.create_task(model_client.infer(param_entity))
    result = loop.run_until_complete(task)
    return result

app = Flask(__name__)

@app.route("/test", methods=["POST"])
def demo_controller(param_entity):
    result = demo_service(param_entity)
    return result

the app will produce "Timeout context manager should be used inside a task" error: 微信图片_20230707144926 if created triton client every time calling the infer function like below,no errors occured:

    def infer(self, param_entity):
        input_data = self.prepare_input_data(param_entity)
       triton_client =self._init_client()
        infer_result = await  triton_client .infer(
            self.trt_model_name, input_data, outputs=outputs, timeout=self.timeout
        )
        return infer_result

I don't want to create a client for each request because I'm concerned that it may have potential performance implications for the program.Anyone can tell me why reusing the client did't work

DequanZhu commented 1 year ago

My triton client is 2.26.0 version

gesanqiu commented 5 months ago

@DequanZhu Hi, have you solved this issue? I also meet this issue, and I can't even successfully call it once...

gesanqiu commented 5 months ago

@nnshah1 any progress of this issue?

gesanqiu commented 4 months ago

After further investigation, this is because the triton_client is not in the same event_loop of flask app. Make sure both triton aio client and flask app are in the same async session can fix this issue, for example init the DemoModelClient in the startup event of flask app.