Open Kanupriyagoyal opened 3 months ago
hi @Kanupriyagoyal try this:
{
"data": [
{
"IN0": {
"content": ["17", "2", "2007", "6", "30", "16", "15", "0", "5.4", "Swipe Transaction", "-6571010470072147219", "Bloomville", "OH", "44818", "5499", "Bad PIN"],
"shape": [16]
}
}
]
}
@nv-hwoo i tried
I0822 04:24:42.997591 111595 infer_handler.cc:975] "[request id: <id_unknown>] Infer failed: [request id: <id_unknown>] expected 16 string elements for inference input 'IN0', got 1"
I0822 04:24:42.997662 111595 infer_handler.h:1311] "Received notification for ModelInferHandler, 0"
I0822 04:24:42.997667 111595 infer_handler.cc:728] "Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE"
I0822 04:24:42.997685 111595 infer_handler.cc:728] "Process for ModelInferHandler, rpc_ok=1, 0 step FINISH"
input_suggested.json
{
"data": [
{
"IN0": {
"content": ["17", "2", "2007", "6", "30", "16", "15", "0", "5.4", "Swipe Transaction", "-6571010470072147219", "Bloomville", "OH", "44818", "5499", "Bad PIN"],
"shape": [16]
}
}
]
}
perf_analyzer -m xgb_model --service-kind=triton --model-repository=/models -b 1 -u localhost:8001 -i grpc -f xgb_model.csv --verbose-csv --concurrency-range 1 --measurement-mode count_windows --input-tensor-format json --input-data input_suggested.json --collect-metrics --metrics-url http://localhost:8002/metrics --metrics-interval 1000
Successfully read data for 1 stream/streams with 1 step/steps.
*** Measurement Settings ***
Batch size: 1
Service Kind: TRITON
Using "count_windows" mode for stabilization
Stabilizing using average latency and throughput
Minimum number of samples in each window: 50
Using synchronous calls for inference
Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: [request id: <id_unknown>] expected 16 string elements for inference input 'IN0', got 1
@nv-hwoo @Kanupriyagoyal
After some analysis, I identified that when we send the JSON
input through HTTP
to theperf_analyzer,
it is interpreting the input format as binary
by 'default
'. The http_server.cc
file in Triton contains specific logic to handle binary and byte data separately.
To resolve this, explicitly specify that the input format is JSON by using the following option:
--input-tensor-format json
This worked for me when my input is http - json and my count issue is resolved.
(make sure endianness of bytes handled well too)
Triton inference server:r24.07 and model_analyzer:1.42.0 config.pbtxt
Tried with inference:
But when passing to perf analyzer as --input-data input.json where json looks like:
Getting error: Thread [0] had error: [request id:] expected 16 string elements for inference input 'IN0', got 1
or
error: Failed to init manager inputs: unable to find string data in json.
How need to pass string data?