triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.93k stars 1.43k forks source link

All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally. #6899

Open SunnyGhj opened 6 months ago

SunnyGhj commented 6 months ago

Description All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally.

Triton Information 23.10

Are you using the Triton container or did you build it yourself? container from NGC

To Reproduce When using the TensorRT backend, I often encounter a large number of connection timeouts with gRPC, while HTTP requests work fine. This indicates that there is no problem with the model, but rather with the RPC. After restarting the service, RPC requests return to normal.

Expected behavior

SunnyGhj commented 6 months ago

After tcp packet capture analysis, grpc port 8001 is normal, it is confirmed that the request reaches tritonserver, and finally times out.

SunnyGhj commented 6 months ago

@tanmayv25 @Tabrizian @CoderHam Sincerely asking for help!

biaochen commented 6 months ago

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

biaochen commented 6 months ago

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

image After set log_verbose_level=2, I found more information. It seems request cannot be fetched from cq. hope this find could help investigate.

oandreeva-nv commented 6 months ago

Thank you for reporting this, I filed a ticket for our team to investigate: 6211

SunnyGhj commented 6 months ago

If the service state is mistakenly judged as shutdown and there are no new requests in the completion queue (cq), will it block all RPC requests? 企业微信截图_f63db341-811a-4473-9089-210379ef4227

secain commented 5 months ago

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

I am using http client in the same environment you described. But I'm facing occasional regular timeouts, it seems that the client can't connect to the server, but after one occurrence the rest of the requests are fine until the next time the problem occurs. Will you face the problem too ?

SunnyGhj commented 5 months ago

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

I am using http client in the same environment you described. But I'm facing occasional regular timeouts, it seems that the client can't connect to the server, but after one occurrence the rest of the requests are fine until the next time the problem occurs. Will you face the problem too ?

No, I haven't encountered this problem

SunnyGhj commented 5 months ago

If the service state is mistakenly judged as shutdown and there are no new requests in the completion queue (cq), will it block all RPC requests?

sssss

SunnyGhj commented 5 months ago

Thank you for reporting this, I filed a ticket for our team to investigate: 6211

Hi, Andreeva. Is there any progress?

oandreeva-nv commented 5 months ago

The issue is being looked at.

SunnyGhj commented 5 months ago

The issue is being looked at.

Thanks, look forward to your soonest reply.