Open MLHafizur opened 1 year ago
@tjohnson31415 @njhill It is mentioned here https://github.com/kserve/modelmesh-serving/blob/main/docs/model-formats/onnx.md that the inputs and outputs of the model can be inferred from the model data. How can I infer them to use in the client script?
A couple different ways that you can determine the inputs and outputs of the exported ONNX model:
Use the ModelMetadata
gRPC API to query Triton for info about the loaded model (could use get_model_metadata("pipeline-poc-inference")
from the Triton client to send the request)
Inspecting the ONNX model directly by loading it into memory in a python session/script:
import onnx
model = onnx.load('path/to/model.onnx')
print(model.graph.input)
print(model.graph.output)
Thanks @tjohnson31415, the first option is doing well. Got the model metadata: platform: "onnxruntime_onnx"
inputs {
name: "attention_mask"
datatype: "INT64"
shape: 32
shape: 64
}
inputs {
name: "input.1"
datatype: "INT64"
shape: 32
shape: 64
}
outputs {
name: "1643"
datatype: "FP32"
shape: 32
shape: 16
}
But now struggling to figure out the input data type issue, although tried in many different ways:
Traceback (most recent call last):
File "predict.py", line 97, in <module>
response = grpc_stub.ModelInfer(request)
File "/home/hafizur/miniconda3/envs/onnx/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/hafizur/miniconda3/envs/onnx/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "inference.GRPCInferenceService/ModelInfer: INVALID_ARGUMENT: unexpected explicit tensor data for input tensor 'attention_mask' for model 'pipeline-poc-inference__isvc-211152d1e7' of type 'INT32', expected datatype 'INT64'"
debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:8033 {created_time:"2023-02-22T15:50:59.6900669-05:00", grpc_status:3, grpc_message:"inference.GRPCInferenceService/ModelInfer: INVALID_ARGUMENT: unexpected explicit tensor data for input tensor \'attention_mask\' for model \'pipeline-poc-inference__isvc-211152d1e7\' of type \'INT32\', expected datatype \'INT64\'"}"
The input datas are like:
[[ 101 2424 2041 ... 1997 1037 102]
[ 101 1996 2343 ... 9228 2003 102]
[ 101 9710 22002 ... 1010 2256 102]
...
[ 101 26624 2139 ... 2055 4825 102]
[ 101 2508 22889 ... 0 0 0]
[ 101 10352 10958 ... 2053 2386 102]]
[[1 1 1 ... 1 1 1]
[1 1 1 ... 1 1 1]
[1 1 1 ... 1 1 1]
...
[1 1 1 ... 1 1 1]
[1 1 1 ... 0 0 0]
[1 1 1 ... 1 1 1]]
b_input_ids data type: int64
b_input_ids shape: (32, 64)
b_input_mask data type: int64
b_input_mask shape: (32, 64)
Here are the final codes I tried to run:
def to_numpy(tensor):
return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
input_path = "."
# load the needed input (as needed by ONNX)
dataloader = joblib.load(os.path.join(input_path, "eval_dataloader.pkl"))
# pick a sample batch as sample input
batches = [batch for batch in dataloader]
_batch = batches[0]
# inputs needed for the model
#b_input_ids = _batch[0].to('cpu').long().to(torch.int64)
b_input_ids = _batch[0].to('cpu')
b_input_ids = to_numpy(b_input_ids)
#b_input_mask = _batch[1].to('cpu').long().to(torch.int64)
b_input_mask = _batch[1].to('cpu')
b_input_mask = to_numpy(b_input_mask)
print(b_input_ids)
print(b_input_mask)
print("b_input_ids data type:", b_input_ids.dtype)
print("b_input_ids shape:", b_input_ids.shape)
print("b_input_mask data type:", b_input_mask.dtype)
print("b_input_mask shape:", b_input_mask.shape)
# Send request to the server
grpc_channel = grpc.insecure_channel("localhost:8033")
grpc_stub = service_pb2_grpc.GRPCInferenceServiceStub(grpc_channel)
model_name = "pipeline-poc-inference"
model_version = ""
request = service_pb2.ModelMetadataRequest(name=model_name,
version=model_version)
response = grpc_stub.ModelMetadata(request)
print("model metadata:\n{}".format(response))
#b_input_mask = b_input_mask.astype('int64')
#b_input_mask = b_input_mask.astype(np.int64)
b_input_mask = np.array(b_input_mask, dtype=np.int64)
# Infer
request = service_pb2.ModelInferRequest()
request.model_name = model_name
request.model_version = model_version
request.id = "my request id"
input0 = service_pb2.ModelInferRequest().InferInputTensor()
input0.name = "attention_mask"
# input0.datatype = "INT64"
# input0.shape.extend([32, 64])
# #input0.contents.int_contents[:] = b_input_mask
# #input0.contents.int_contents[:] = b_input_mask.tolist()
# #input0.contents.int_contents[:] = list(map(int, b_input_mask.tolist()))
# input0.contents.int_contents[:] = list(map(int, b_input_mask.ravel().tolist()))
input1 = service_pb2.ModelInferRequest().InferInputTensor()
input1.name = "input.1"
# input1.datatype = "INT64"
# input1.shape.extend([32, 64])
# #input0.contents.int_contents[: : ] = b_input_ids
# #input1.contents.int_contents[:] = b_input_ids.tolist()
# #input1.contents.int_contents[:] = list(map(int, b_input_ids.tolist()))
# input1.contents.int_contents[:] = list(map(int, b_input_ids.ravel().tolist()))
input0.datatype = "INT64"
input0.shape.extend([b_input_mask.shape[0], b_input_mask.shape[1]])
input0.contents.int_contents.extend(b_input_mask.ravel().tolist())
input1.datatype = "INT64"
input1.shape.extend([b_input_ids.shape[0], b_input_ids.shape[1]])
input1.contents.int_contents.extend(b_input_ids.ravel().tolist())
request.inputs.extend([input0, input1])
output0 = service_pb2.ModelInferRequest().InferRequestedOutputTensor()
output0.name = "1643"
request.outputs.extend([output0])
response = grpc_stub.ModelInfer(request)
print("response:\n{}".format(response))
What could be the issue?
I think this next issue is that you are using input0.contents.int_contents.extend()
instead of input0.contents.int64_contents.extend()
(notice int vs int64). int_contents
is for 32-bit integers (ref).
Hi @tjohnson31415 I noticed and got the error. Thank you very much for your support. But strugling with an error from server: I am port-forwarding:
(base) hafizur@TOR-RAHMANHAFIZ:~$ kubectl port-forward service/modelmesh-serving 8033 -n modelmesh-serving-dev
Forwarding from 127.0.0.1:8033 -> 8033
Forwarding from [::1]:8033 -> 8033
Handling connection for 8033
Traceback (most recent call last):
File "predict.py", line 47, in <module>
response = grpc_stub.ModelMetadata(request)
File "/home/hafizur/miniconda3/envs/onnx/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/hafizur/miniconda3/envs/onnx/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Nowhere available to load"
debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:8033 {grpc_message:"Nowhere available to load", grpc_status:13, created_time:"2023-02-22T18:40:56.3422454-05:00"}"
>
The Triton
mm
container logs are :
"instant":{"epochSecond":1677108337,"nanoOfSecond":213884861},"thread":"ll-conn-retry-thread-1","level":"ERROR","loggerName":"com.ibm.watson.litelinks.client.ServiceInstance","message":"Failed to open new connection to 10.244.54.20:8080;v=20230111-f9487: com.ibm.watson.litelinks.TTimeoutException: opening new channel failed: /10.244.54.20:8080 (TIMED_OUT)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":272,"threadPriority":5}
{"instant":{"epochSecond":1677108343,"nanoOfSecond":569292479},"thread":"ll-conn-retry-thread-2","level":"ERROR","loggerName":"com.ibm.watson.litelinks.client.ServiceInstance","message":"Failed to open new connection to 10.244.41.8:8080;v=20230111-f9487: com.ibm.watson.litelinks.WTTransportException: opening new channel failed: /10.244.41.8:8080 (UNKNOWN)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":273,"threadPriority":5}
{"instant":{"epochSecond":1677108351,"nanoOfSecond":202538341},"thread":"invoke-ho-pipeline-poc-inference__isvc-211152d1e7","level":"WARN","loggerName":"com.ibm.watson.modelmesh.SidecarModelMesh","message":"Triggered \"cleanup\" unload for model pipeline-poc-inference__isvc-211152d1e7 after unexpected NOT_FOUND received from inference request","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","contextMap":{},"threadId":56,"threadPriority":5}
{"instant":{"epochSecond":1677108351,"nanoOfSecond":202657349},"thread":"invoke-ho-pipeline-poc-inference__isvc-211152d1e7","level":"ERROR","loggerName":"com.ibm.watson.modelmesh.SidecarModelMesh","message":"Error invoking inference.GRPCInferenceService/ModelMetadata method on model pipeline-poc-inference__isvc-211152d1e7: UNAVAILABLE: Request for unknown model: 'pipeline-poc-inference__isvc-211152d1e7' is not found","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","contextMap":{},"threadId":56,"threadPriority":5}
{"instant":{"epochSecond":1677108351,"nanoOfSecond":204890492},"thread":"invoke-ho-pipeline-poc-inference__isvc-211152d1e7","level":"WARN","loggerName":"com.ibm.watson.modelmesh.ModelMesh","message":"ModelRuntime in instance c4dc88-c79rn returned unexpected NOT_FOUND for model pipeline-poc-inference__isvc-211152d1e7; purging from local cache and registration","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","contextMap":{},"threadId":56,"threadPriority":5}
{"instant":{"epochSecond":1677108364,"nanoOfSecond":941712037},"thread":"janitor-task","level":"INFO","loggerName":"com.ibm.watson.modelmesh.ModelMesh","message":"Janitor registry pruning task took 2ms for 0/11 entries","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","contextMap":{},"threadId":40,"threadPriority":5}
{"instant":{"epochSecond":1677108374,"nanoOfSecond":42354088},"thread":"ll-conn-retry-thread-2","level":"ERROR","loggerName":"com.ibm.watson.litelinks.client.ServiceInstance","message":"Failed to open new connection to 10.244.41.8:8080;v=20230111-f9487: com.ibm.watson.litelinks.WTTransportException: opening new channel failed: /10.244.41.8:8080 (UNKNOWN)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":273,"threadPriority":5}
{"instant":{"epochSecond":1677108374,"nanoOfSecond":837546364},"thread":"ll-conn-retry-thread-1","level":"ERROR","loggerName":"com.ibm.watson.litelinks.client.ServiceInstance","message":"Failed to open new connection to 10.244.54.20:8080;v=20230111-f9487: com.ibm.watson.litelinks.TTimeoutException: opening new channel failed: /10.244.54.20:8080 (TIMED_OUT)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":272,"threadPriority":5}
{"instant":{"epochSecond":1677108375,"nanoOfSecond":706869303},"thread":"mm-task-thread-1","level":"INFO","loggerName":"com.ibm.watson.modelmesh.ModelMesh","message":"Published new instance record: InstanceRecord [lruTime=never, count=0, capacity=113542, used=0 (0%), loc=10.214.4.65, zone=<none>, labels=[mt:keras, mt:keras:2, mt:onnx, mt:onnx:1, mt:pytorch, mt:pytorch:1, mt:tensorflow, mt:tensorflow:1, mt:tensorflow:2, mt:tensorrt, mt:tensorrt:7, pv:grpc-v2, pv:v2, rt:triton-2.x], startTime=1676346243353 (9 days ago), vers=0, loadThreads=2, loadInProg=0, reqsPerMin=0], UBW=1146, TUW=0, TCO=0","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","contextMap":{},"threadId":39,"threadPriority":5}
{"instant":{"epochSecond":1677108405,"nanoOfSecond":715101251},"thread":"ll-conn-retry-thread-2","level":"ERROR","loggerName":"com.ibm.watson.litelinks.client.ServiceInstance","message":"Failed to open new connection to 10.244.41.8:8080;v=20230111-f9487: com.ibm.watson.litelinks.WTTransportException: opening new channel failed: /10.244.41.8:8080 (UNKNOWN)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":273,"threadPriority":5}
Don't see any useful logs in triton and triton-adapter containers.
This issue looks a bit tougher 🤔
The Failed to open new connection
errors to port 8080 indicate a communication issue between the model mesh containers (port 8080 is what the mm
containers use to communicate to each other). The unexpected NOT_FOUND
errors indicates that mm
thought the model was loaded, but the runtime reported that it wasn't. My hunch from these errors is that there is a networking issue and/or containers are restarting. Are there any restarts reported on the pods?
If there are restarts, my guess would be that it is a memory issue, but you can check in the kubectl describe
output to see what the exit code is reported as.
The pod restared at the first time I tried, latter tries did not experience the restarts except the error.
modelmesh-serving-triton-2.x-54fbc4dc88-c79rn 4/4 Running 1 (95m ago) 8d
modelmesh-serving-triton-2.x-54fbc4dc88-t2mlg 4/4 Running 0 31h
Hmm. It seems that you had the model loaded in and could use the ModelMetadata call, but then, when trying an inference call, the Triton container crashed and restarted? It is still worth looking in to that restart to confirm and see the reason. If OOMKill, then you should increase the memory allocation in the ServingRuntime.
Even with the runtime container restarting, ModelMesh should be able to recover and load the model in another pod, but I think the connection errors between mm pods is preventing that from happening (and I'm not sure why that is happening).
I would try:
kubectl rollout restart deployment modelmesh-serving-triton-2.x
will do)Hi @tjohnson31415, Its interesting, I was able to get the model metadata yesterday, but with the same code and setup getting "Nowhere available to load" error:
def to_numpy(tensor):
return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
input_path = "."
# load the needed input (as needed by ONNX)
dataloader = joblib.load(os.path.join(input_path, "eval_dataloader.pkl"))
# pick a sample batch as sample input
batches = [batch for batch in dataloader]
_batch = batches[0]
# inputs needed for the model
#b_input_ids = _batch[0].to('cpu').long().to(torch.int64)
b_input_ids = _batch[0].to('cpu')
b_input_ids = to_numpy(b_input_ids)
#b_input_mask = _batch[1].to('cpu').long().to(torch.int64)
b_input_mask = _batch[1].to('cpu')
b_input_mask = to_numpy(b_input_mask)
# Send request to the server
grpc_channel = grpc.insecure_channel("localhost:8033")
grpc_stub = service_pb2_grpc.GRPCInferenceServiceStub(grpc_channel)
model_name = "pipeline-poc-inference"
model_version = ""
request = service_pb2.ModelMetadataRequest(name=model_name,
version=model_version)
response = grpc_stub.ModelMetadata(request)
print("model metadata:\n{}".format(response))
error:
Traceback (most recent call last):
File "test.py", line 42, in <module>
response = grpc_stub.ModelMetadata(request)
File "/home/hafizur/miniconda3/envs/onnx/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/hafizur/miniconda3/envs/onnx/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Nowhere available to load"
debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:8033 {grpc_message:"Nowhere available to load", grpc_status:13, created_time:"2023-02-23T16:30:31.2467194-05:00"}"
@MLHafizur did you check whether the trition container crashed/restarted again? This is probably what the “ Nowhere available to load” indicates.
@njhill The pods are up and running. No restart and crash. Also increased the resources. Restarted the etcd pod. nothing helped. We are blocked here. Any help would be appreciated.
Hmm, interesting that the model worked before and cannot load at all now. Could you try to load one of the sample models (or another model that you know has worked for you) to see if the load/inference failures are particular to this model.
Another idea to try would be to create a copy of the InferenceService with a new name to see if that can load. This would check if there are internal references to the model that are not being cleaned up when it is deleted (which shouldn't happen, but 🤷).
If no models can load, it might be time to try a full re-install and see if this situation is reproducible.
Hello @MLHafizur. Do you have any updates on this issue? Are you still experiencing the "Nowhere available to load" errors?
Hi @tjohnson31415 Unfortunately the above solution did not work. Also It is not possible to re-install ModelMesh. So we are still getting the same error.
I exported a pytorch (model.pt) model to ONNX:
Deployed the model on ModelMesh successfully . Now trying to build python GRPC client:
With this script I am struggle to get the the output name, so getting the following error:
Am I heading to a proper way? Is there any way to get the correct output name the model is expecting? Thanks for your help!!