Open zengqingfu1442 opened 10 months ago
I use curl
but also the same error:
curl -X POST http://172.16.11.33:8000/v2/repository/models/mymodel/load -d '{"parameters": {"config": { "name": "mymodel", "backend": "python", "inputs": [{"name": "prompt", "datatype": "TYPE_STRING", "dims": [ 1 ]}], "outputs": [{"name": "generated_text", "datatype": "TYPE_STRING", "dims": [ 1 ]}], "instance_group": [{"count": 1, "kind": "KIND_GPU", "gpus": [ 1 ]}] }}}'
{"error":"attempt to access JSON non-string as string"}
I use the following json and successfully loaded the model, but i found that the model was not loaded as the json specified, its instance_group.passive
is still false
, not the same as what i give in the following json.
{
"name": "mymodel",
"platform": "",
"backend": "python",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 0,
"input": [
{
"name": "prompt",
"data_type": "TYPE_STRING",
"format": "FORMAT_NONE",
"dims": [
1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "generated_text",
"data_type": "TYPE_STRING",
"dims": [
1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"instance_group": [
{
"name": "mymodel_0",
"kind": "KIND_GPU",
"count": 1,
"gpus": [
1
],
"secondary_devices": [],
"profile": [],
"passive": true,
"host_policy": ""
}
],
"default_model_filename": "model.py",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {},
"model_warmup": [],
"model_transaction_policy": {
"decoupled": false
}
}
Hi @zengqingfu1442, I was able to pass model config as JSON via the HTTP load API with "passive": true
. I think it could be a format issue on the HTTP payload. Would you be able to use the HTTP client?
https://github.com/triton-inference-server/client/blob/main/src/python/library/tritonclient/http/_client.py#L614
but i found that the model was not load
I used curl
to call the api. Ok, I would try the triton python client you provided in this link.
Hi @zengqingfu1442, I was able to pass model config as JSON via the HTTP load API with
"passive": true
. I think it could be a format issue on the HTTP payload. Would you be able to use the HTTP client? https://github.com/triton-inference-server/client/blob/main/src/python/library/tritonclient/http/_client.py#L614
I tried this way and it works for me! Thanks. It seems that using the cli command curl
to call the api is different from using the triton client.
@kthui I can use triton python client to successfully load the model, but then I use curl
to launch the inference request, the tritonserver process crashed at once.
I1128 11:15:01.563631 2651 model_lifecycle.cc:818] successfully loaded 'mymodel'
Signal (11) received.
0# 0x000055B9F1E5F13D in /opt/tritonserver/bin/tritonserver
1# 0x0000152A331A2520 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# TRITONSERVER_ServerInferAsync in /opt/tritonserver/bin/../lib/libtritonserver.so
3# 0x000055B9F1FBCFDA in /opt/tritonserver/bin/tritonserver
4# 0x000055B9F1FBFEAB in /opt/tritonserver/bin/tritonserver
5# 0x000055B9F2587175 in /opt/tritonserver/bin/tritonserver
6# 0x000055B9F258B9D5 in /opt/tritonserver/bin/tritonserver
7# 0x000055B9F2589D8E in /opt/tritonserver/bin/tritonserver
8# 0x000055B9F2598DF0 in /opt/tritonserver/bin/tritonserver
9# 0x000055B9F25A1720 in /opt/tritonserver/bin/tritonserver
10# 0x000055B9F25A2197 in /opt/tritonserver/bin/tritonserver
11# 0x000055B9F258DD62 in /opt/tritonserver/bin/tritonserver
12# 0x0000152A331F4AC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
13# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
I1128 11:15:21.074413 2886 pb_stub.cc:1815] Non-graceful termination detected.
I1128 11:15:21.356258 2882 pb_stub.cc:1815] Non-graceful termination detected.
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node 275475e266c6 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
@zengqingfu1442, can you share the full curl
command which triggered the crash?
@zengqingfu1442, can you share the full
curl
command which triggered the crash?
curl -X POST localhost:8000/v2/models/mymodel/generate -d '{"prompt": "<system></system><user>140+10*2等于几?乘法和加法的优先级哪一个更高?</user><assistent>"}'
This is strange, I tried the command and was not able to replicate any issue:
$ curl -X POST localhost:8000/v2/models/string/generate -d '{"INPUT0": "<system></system><user>140+10*2等于几?乘法和加法的优先级哪一个更高?</user><assistent>"}'
{"OUTPUT0":"<system></system><user>140+10*2等于几?乘法和加法的优先级哪一个更高?</user><assistent>","model_name":"string","model_version":"1"}
$
The model I used was a Python string identity model, which is why the output is the same as the input.
Would you be able to share the bytes that was sent to the server that triggered the crash?
If you change the model.py
to the Python string identity model, can you still replicate the crash?
If you change the
model.py
to the Python string identity model, can you still replicate the crash?
I tried this model and it runs on CPU, and the tritonserver didn't crash after the same steps that I did before.
This is strange, I tried the command and was not able to replicate any issue:
$ curl -X POST localhost:8000/v2/models/string/generate -d '{"INPUT0": "<system></system><user>140+10*2等于几?乘法和加法的优先级哪一个更高?</user><assistent>"}' {"OUTPUT0":"<system></system><user>140+10*2等于几?乘法和加法的优先级哪一个更高?</user><assistent>","model_name":"string","model_version":"1"} $
The model I used was a Python string identity model, which is why the output is the same as the input.
Would you be able to share the bytes that was sent to the server that triggered the crash?
https://gist.github.com/zengqingfu1442/6613d47cc119029b4d954509aa412171 here is my costomized model named mymodel
using python backend.
If you change the model.py to the Python string identity model, can you still replicate the crash?
I tried this model and it runs on CPU, and the tritonserver didn't crash after the same steps that I did before.
I think the crash happened inside TRT-LLM backend. You were seeing traces back to Triton because TRT-LLM backend uses Triton internally, and the Triton that crashed was launched by mpirun
, see
mpirun noticed that process rank 0 with PID 0 on node 275475e266c6 exited on signal 11 (Segmentation fault).
on your log.
I will transfer your issue to the TRT-LLM team for them to take a look at it.
If you change the model.py to the Python string identity model, can you still replicate the crash?
I tried this model and it runs on CPU, and the tritonserver didn't crash after the same steps that I did before.
I think the crash happened inside TRT-LLM backend. You were seeing traces back to Triton because TRT-LLM backend uses Triton internally, and the Triton that crashed was launched by
mpirun
, seempirun noticed that process rank 0 with PID 0 on node 275475e266c6 exited on signal 11 (Segmentation fault).
on your log.
I will transfer your issue to the TRT-LLM team for them to take a look at it.
But the customized model mymodel
used python backend rather than trt-llm
backend, i am a littile confused.
Hi @zengqingfu1442, I was able to pass model config as JSON via the HTTP load API with
"passive": true
. I think it could be a format issue on the HTTP payload. Would you be able to use the HTTP client? https://github.com/triton-inference-server/client/blob/main/src/python/library/tritonclient/http/_client.py#L614
So if i want to curl
to call the load model api of the tritonserver, then how should i write the curl
command and format the http payload? thanks.
Hi @zengqingfu1442, I was able to pass model config as JSON via the HTTP load API with
"passive": true
. I think it could be a format issue on the HTTP payload. Would you be able to use the HTTP client? https://github.com/triton-inference-server/client/blob/main/src/python/library/tritonclient/http/_client.py#L614
i used the following json and use curl to call the api, but failed with error "error": "failed to parse the request JSON buffer: Invalid escape character in string. at 42"
{
"parameters": {
"config": "{
"name": "mymodel",
"platform": "",
"backend": "python",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 0,
"input": [
{
"name": "prompt",
"data_type": "TYPE_STRING",
"format": "FORMAT_NONE",
"dims": [
1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "generated_text",
"data_type": "TYPE_STRING",
"dims": [
1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"instance_group": [
{
"name": "mymodel_0",
"kind": "KIND_GPU",
"count": 1,
"gpus": [
1
],
"secondary_devices": [],
"profile": [],
"passive": true,
"host_policy": ""
}
],
"default_model_filename": "model.py",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {},
"model_warmup": [],
"model_transaction_policy": {
"decoupled": false
}
}"
}
}
Description
Triton Information What version of Triton are you using? triton image: nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 tritonserver version: 2.39.0
Are you using the Triton container or did you build it yourself? I'm using
nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3
docker image.To Reproduce
mymodel
folder under/triton_model_repo
, and there is1/model.py
andconfig.pbtxt
under/triton_model_repo/mymodel
mymodel
usespython
backend, it is not ensemble model, the following is the content of/triton_model_repo/mymodel/config.pbtxt
:Expected behavior
mymodel
can be successfully loaded.