kserve / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
202 stars 114 forks source link

ExecutionBatchError: Failed "execute_batch": #343

Open MLHafizur opened 1 year ago

MLHafizur commented 1 year ago

We have deployed models using MLServer Custom Runtime. Getting the following error during inferencing:

Task exception was never retrieved
future: <Task finished name='Task-18' coro=<<coroutine without __name__>()> exception=ExecuteBatchError('Failed "execute_batch": (<grpc._cython.cygrpc.SendInitialMetadataOperation object at 0x7fa530536540>, <grpc._cython.cygrpc.SendStatusFromServerOperation object at 0x7fa43c4c2a00>)')>
Traceback (most recent call last):
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 719, in _handle_exceptions
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/callback_common.pyx.pxi", line 184, in _send_error_status_from_server
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/callback_common.pyx.pxi", line 98, in execute_batch
grpc._cython.cygrpc.ExecuteBatchError: Failed "execute_batch": (<grpc._cython.cygrpc.SendInitialMetadataOperation object at 0x7fa530536540>, <grpc._cython.cygrpc.SendStatusFromServerOperation object at 0x7fa43c4c2a00>)

Never seen this error before, any idea?

tjohnson31415 commented 1 year ago

I found a couple issues in the gRPC repo that look relevant:

From the later comments on the second issue it sounds like it could be related to a client side disconnect during process of a gRPC request with the new AsyncIO API.

njhill commented 1 year ago

@MLHafizur this might indicate that the MM and/or adapter containers restarted, could you check whether that's the case?