Closed rtalaricw closed 2 years ago
Please provide the full log when you launch the server.
You can add FT_LOG_LEVEL=DEBUG
when you launch server to print all debug messages, too.
@byshiue Like this: /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store FT_LOG_LEVEL=DEBUG
?
Here is the full log with --log-verbose 10:
I0909 18:03:03.597110 36 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f5396000000' with size 268435456
I0909 18:03:03.602194 36 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0909 18:03:03.602207 36 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
I0909 18:03:03.735177 36 model_config_utils.cc:645] Server side auto-completed config: name: "fastertransformer"
max_batch_size: 1024
input {
name: "input_ids"
data_type: TYPE_UINT32
dims: -1
}
input {
name: "start_id"
data_type: TYPE_UINT32
dims: 1
reshape {
}
optional: true
}
input {
name: "end_id"
data_type: TYPE_UINT32
dims: 1
reshape {
}
optional: true
}
input {
name: "input_lengths"
data_type: TYPE_UINT32
dims: 1
reshape {
}
}
input {
name: "request_output_len"
data_type: TYPE_UINT32
dims: -1
}
input {
name: "runtime_top_k"
data_type: TYPE_UINT32
dims: 1
reshape {
}
optional: true
}
input {
name: "runtime_top_p"
data_type: TYPE_FP32
dims: 1
reshape {
}
optional: true
}
input {
name: "beam_search_diversity_rate"
data_type: TYPE_FP32
dims: 1
reshape {
}
optional: true
}
input {
name: "temperature"
data_type: TYPE_FP32
dims: 1
reshape {
}
optional: true
}
input {
name: "len_penalty"
data_type: TYPE_FP32
dims: 1
reshape {
}
optional: true
}
input {
name: "repetition_penalty"
data_type: TYPE_FP32
dims: 1
reshape {
}
optional: true
}
input {
name: "random_seed"
data_type: TYPE_UINT64
dims: 1
reshape {
}
optional: true
}
input {
name: "is_return_log_probs"
data_type: TYPE_BOOL
dims: 1
reshape {
}
optional: true
}
input {
name: "beam_width"
data_type: TYPE_UINT32
dims: 1
reshape {
}
optional: true
}
input {
name: "bad_words_list"
data_type: TYPE_INT32
dims: 2
dims: -1
optional: true
}
input {
name: "stop_words_list"
data_type: TYPE_INT32
dims: 2
dims: -1
optional: true
}
input {
name: "prompt_learning_task_name_ids"
data_type: TYPE_UINT32
dims: 1
reshape {
}
optional: true
}
output {
name: "output_ids"
data_type: TYPE_UINT32
dims: -1
dims: -1
}
output {
name: "sequence_length"
data_type: TYPE_UINT32
dims: -1
}
output {
name: "cum_log_probs"
data_type: TYPE_FP32
dims: -1
}
output {
name: "output_log_probs"
data_type: TYPE_FP32
dims: -1
dims: -1
}
default_model_filename: "gptneox_20b"
parameters {
key: "data_type"
value {
string_value: "fp32"
}
}
parameters {
key: "enable_custom_all_reduce"
value {
string_value: "0"
}
}
parameters {
key: "model_checkpoint_path"
value {
string_value: "/mnt/pvc/triton-model-store/fastertransformer/1/"
}
}
parameters {
key: "model_type"
value {
string_value: "GPT-NeoX"
}
}
parameters {
key: "pipeline_para_size"
value {
string_value: "1"
}
}
parameters {
key: "tensor_para_size"
value {
string_value: "2"
}
}
backend: "fastertransformer"
model_transaction_policy {
}
I0909 18:03:03.738451 36 model_repository_manager.cc:898] AsyncLoad() 'fastertransformer'
I0909 18:03:03.738517 36 model_repository_manager.cc:1136] TriggerNextAction() 'fastertransformer' version 1: 1
I0909 18:03:03.738528 36 model_repository_manager.cc:1172] Load() 'fastertransformer' version 1
I0909 18:03:03.738534 36 model_repository_manager.cc:1191] loading: fastertransformer:1
I0909 18:03:03.838733 36 model_repository_manager.cc:1249] CreateModel() 'fastertransformer' version 1
I0909 18:03:03.838849 36 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4
I0909 18:03:03.838879 36 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so
I0909 18:03:04.188234 36 libfastertransformer.cc:1226] TRITONBACKEND_Initialize: fastertransformer
I0909 18:03:04.188269 36 libfastertransformer.cc:1236] Triton TRITONBACKEND API version: 1.9
I0909 18:03:04.188274 36 libfastertransformer.cc:1242] 'fastertransformer' TRITONBACKEND API version: 1.9
I0909 18:03:04.188324 36 libfastertransformer.cc:1274] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I0909 18:03:04.189340 36 model_config_utils.cc:1597] ModelConfig 64-bit fields:
I0909 18:03:04.189353 36 model_config_utils.cc:1599] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0909 18:03:04.189358 36 model_config_utils.cc:1599] ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0909 18:03:04.189361 36 model_config_utils.cc:1599] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0909 18:03:04.189365 36 model_config_utils.cc:1599] ModelConfig::ensemble_scheduling::step::model_version
I0909 18:03:04.189370 36 model_config_utils.cc:1599] ModelConfig::input::dims
I0909 18:03:04.189373 36 model_config_utils.cc:1599] ModelConfig::input::reshape::shape
I0909 18:03:04.189377 36 model_config_utils.cc:1599] ModelConfig::instance_group::secondary_devices::device_id
I0909 18:03:04.189382 36 model_config_utils.cc:1599] ModelConfig::model_warmup::inputs::value::dims
I0909 18:03:04.189387 36 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0909 18:03:04.189391 36 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0909 18:03:04.189396 36 model_config_utils.cc:1599] ModelConfig::output::dims
I0909 18:03:04.189402 36 model_config_utils.cc:1599] ModelConfig::output::reshape::shape
I0909 18:03:04.189407 36 model_config_utils.cc:1599] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0909 18:03:04.189412 36 model_config_utils.cc:1599] ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0909 18:03:04.189417 36 model_config_utils.cc:1599] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0909 18:03:04.189422 36 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::dims
I0909 18:03:04.189426 36 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::initial_state::dims
I0909 18:03:04.189430 36 model_config_utils.cc:1599] ModelConfig::version_policy::specific::versions
W0909 18:03:04.189708 36 libfastertransformer.cc:149] model configuration:
{
"name": "fastertransformer",
"platform": "",
"backend": "fastertransformer",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 1024,
"input": [
{
"name": "input_ids",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "start_id",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "end_id",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "input_lengths",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "request_output_len",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "runtime_top_k",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "runtime_top_p",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "beam_search_diversity_rate",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "temperature",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "len_penalty",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "repetition_penalty",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "random_seed",
"data_type": "TYPE_UINT64",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "is_return_log_probs",
"data_type": "TYPE_BOOL",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "beam_width",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "bad_words_list",
"data_type": "TYPE_INT32",
"format": "FORMAT_NONE",
"dims": [
2,
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "stop_words_list",
"data_type": "TYPE_INT32",
"format": "FORMAT_NONE",
"dims": [
2,
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "prompt_learning_task_name_ids",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
}
],
"output": [
{
"name": "output_ids",
"data_type": "TYPE_UINT32",
"dims": [
-1,
-1
],
"label_filename": "",
"is_shape_tensor": false
},
{
"name": "sequence_length",
"data_type": "TYPE_UINT32",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
},
{
"name": "cum_log_probs",
"data_type": "TYPE_FP32",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
},
{
"name": "output_log_probs",
"data_type": "TYPE_FP32",
"dims": [
-1,
-1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"instance_group": [
{
"name": "fastertransformer",
"kind": "KIND_GPU",
"count": 1,
"gpus": [
0,
1
],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "gptneox_20b",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"data_type": {
"string_value": "fp32"
},
"enable_custom_all_reduce": {
"string_value": "0"
},
"model_type": {
"string_value": "GPT-NeoX"
},
"model_checkpoint_path": {
"string_value": "/mnt/pvc/triton-model-store/fastertransformer/1/"
},
"pipeline_para_size": {
"string_value": "1"
},
"tensor_para_size": {
"string_value": "2"
}
},
"model_warmup": [],
"model_transaction_policy": {
"decoupled": false
}
}
I0909 18:03:04.191573 36 libfastertransformer.cc:1320] TRITONBACKEND_ModelInstanceInitialize: fastertransformer (device 0)
I0909 18:03:04.192350 36 backend_model_instance.cc:105] Creating instance fastertransformer on GPU 0 (8.6) using artifact 'gptneox_20b'
W0909 18:03:04.257650 36 libfastertransformer.cc:453] Faster transformer model instance is created at GPU '0'
W0909 18:03:04.257706 36 libfastertransformer.cc:459] Model name gptneox_20b
W0909 18:03:04.257726 36 libfastertransformer.cc:578] Get input name: input_ids, type: TYPE_UINT32, shape: [-1]
W0909 18:03:04.257735 36 libfastertransformer.cc:578] Get input name: start_id, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257740 36 libfastertransformer.cc:578] Get input name: end_id, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257745 36 libfastertransformer.cc:578] Get input name: input_lengths, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257750 36 libfastertransformer.cc:578] Get input name: request_output_len, type: TYPE_UINT32, shape: [-1]
W0909 18:03:04.257755 36 libfastertransformer.cc:578] Get input name: runtime_top_k, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257758 36 libfastertransformer.cc:578] Get input name: runtime_top_p, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257763 36 libfastertransformer.cc:578] Get input name: beam_search_diversity_rate, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257768 36 libfastertransformer.cc:578] Get input name: temperature, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257774 36 libfastertransformer.cc:578] Get input name: len_penalty, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257779 36 libfastertransformer.cc:578] Get input name: repetition_penalty, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257783 36 libfastertransformer.cc:578] Get input name: random_seed, type: TYPE_UINT64, shape: [1]
W0909 18:03:04.257789 36 libfastertransformer.cc:578] Get input name: is_return_log_probs, type: TYPE_BOOL, shape: [1]
W0909 18:03:04.257793 36 libfastertransformer.cc:578] Get input name: beam_width, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257799 36 libfastertransformer.cc:578] Get input name: bad_words_list, type: TYPE_INT32, shape: [2, -1]
W0909 18:03:04.257803 36 libfastertransformer.cc:578] Get input name: stop_words_list, type: TYPE_INT32, shape: [2, -1]
W0909 18:03:04.257807 36 libfastertransformer.cc:578] Get input name: prompt_learning_task_name_ids, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257814 36 libfastertransformer.cc:620] Get output name: output_ids, type: TYPE_UINT32, shape: [-1, -1]
W0909 18:03:04.257819 36 libfastertransformer.cc:620] Get output name: sequence_length, type: TYPE_UINT32, shape: [-1]
W0909 18:03:04.257823 36 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
W0909 18:03:04.257827 36 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:36 :0:40] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 40) ====
0 0x00000000000143c0 __funlockfile() ???:0
1 0x000000000001ef3a triton::backend::fastertransformer_backend::ModelInstanceState::ModelInstanceState() :0
2 0x000000000001fd42 triton::backend::fastertransformer_backend::ModelInstanceState::Create() :0
3 0x000000000002263c TRITONBACKEND_ModelInstanceInitialize() ???:0
4 0x000000000010ce8a triton::core::TritonModelInstance::CreateInstance() :0
5 0x000000000010e971 triton::core::TritonModelInstance::CreateInstances() :0
6 0x0000000000101a10 triton::core::TritonModel::Create() :0
7 0x00000000001b217a triton::core::ModelRepositoryManager::ModelLifeCycle::CreateModel() :0
8 0x00000000001c0fa1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::core::Status (triton::core::ModelRepositoryManager::ModelLifeCycle::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*), triton::core::ModelRepositoryManager::ModelLifeCycle*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*> > >::_M_run() :0
9 0x00000000000d6de4 std::error_code::default_error_condition() ???:0
10 0x0000000000008609 start_thread() ???:0
11 0x000000000011f163 clone() ???:0
=================================
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] *** Process received signal ***
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] Signal: Segmentation fault (11)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] Signal code: (-6)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] Failing at address: 0x24
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f53ebeeb3c0]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 1] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1ef3a)[0x7f53e19bbf3a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 2] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1fd42)[0x7f53e19bcd42]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 3] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInstanceInitialize+0x38c)[0x7f53e19bf63c]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 4] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10ce8a)[0x7f53eb18ee8a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 5] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10e971)[0x7f53eb190971]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 6] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101a10)[0x7f53eb183a10]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 7] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b217a)[0x7f53eb23417a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 8] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c0fa1)[0x7f53eb242fa1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 9] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f53eacd1de4]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [10] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f53ebedf609]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f53ea9bc163]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] *** End of error message ***
Segmentation fault (core dumped)
@byshiue Like this:
/opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store FT_LOG_LEVEL=DEBUG
?
No. Please run by
FT_LOG_LEVEL=DEBUG /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store
Also, I want to ensure you use the latest main
branch.
I am. I just recently pulled v1.2 and built a new image.
@byshiue Here are logs with (they are same as above):
FT_LOG_LEVEL=DEBUG /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store
root@fastertransformer-triton-predictor-default-00001-deploymen2j9nd:/opt/tritonserver# FT_LOG_LEVEL=DEBUG /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store
I0910 00:33:36.186877 73 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc09e000000' with size 268435456
I0910 00:33:36.191759 73 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0910 00:33:36.191772 73 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
I0910 00:33:36.527831 73 model_repository_manager.cc:1191] loading: fastertransformer:1
I0910 00:33:36.881606 73 libfastertransformer.cc:1226] TRITONBACKEND_Initialize: fastertransformer
I0910 00:33:36.881642 73 libfastertransformer.cc:1236] Triton TRITONBACKEND API version: 1.9
I0910 00:33:36.881650 73 libfastertransformer.cc:1242] 'fastertransformer' TRITONBACKEND API version: 1.9
I0910 00:33:36.881702 73 libfastertransformer.cc:1274] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
W0910 00:33:36.882960 73 libfastertransformer.cc:149] model configuration:
{
"name": "fastertransformer",
"platform": "",
"backend": "fastertransformer",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 1024,
"input": [
{
"name": "input_ids",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "start_id",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "end_id",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "input_lengths",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "request_output_len",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
},
{
"name": "runtime_top_k",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "runtime_top_p",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "beam_search_diversity_rate",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "temperature",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "len_penalty",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "repetition_penalty",
"data_type": "TYPE_FP32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "random_seed",
"data_type": "TYPE_UINT64",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "is_return_log_probs",
"data_type": "TYPE_BOOL",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "beam_width",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "bad_words_list",
"data_type": "TYPE_INT32",
"format": "FORMAT_NONE",
"dims": [
2,
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "stop_words_list",
"data_type": "TYPE_INT32",
"format": "FORMAT_NONE",
"dims": [
2,
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
},
{
"name": "prompt_learning_task_name_ids",
"data_type": "TYPE_UINT32",
"format": "FORMAT_NONE",
"dims": [
1
],
"reshape": {
"shape": []
},
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": true
}
],
"output": [
{
"name": "output_ids",
"data_type": "TYPE_UINT32",
"dims": [
-1,
-1
],
"label_filename": "",
"is_shape_tensor": false
},
{
"name": "sequence_length",
"data_type": "TYPE_UINT32",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
},
{
"name": "cum_log_probs",
"data_type": "TYPE_FP32",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
},
{
"name": "output_log_probs",
"data_type": "TYPE_FP32",
"dims": [
-1,
-1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"instance_group": [
{
"name": "fastertransformer",
"kind": "KIND_GPU",
"count": 1,
"gpus": [
0,
1
],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "gptneox_20b",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"tensor_para_size": {
"string_value": "2"
},
"data_type": {
"string_value": "fp32"
},
"enable_custom_all_reduce": {
"string_value": "0"
},
"model_type": {
"string_value": "GPT-NeoX"
},
"model_checkpoint_path": {
"string_value": "/mnt/pvc/triton-model-store/fastertransformer/1/"
},
"pipeline_para_size": {
"string_value": "1"
}
},
"model_warmup": [],
"model_transaction_policy": {
"decoupled": false
}
}
I0910 00:33:36.884884 73 libfastertransformer.cc:1320] TRITONBACKEND_ModelInstanceInitialize: fastertransformer (device 0)
W0910 00:33:36.953299 73 libfastertransformer.cc:453] Faster transformer model instance is created at GPU '0'
W0910 00:33:36.953344 73 libfastertransformer.cc:459] Model name gptneox_20b
W0910 00:33:36.953364 73 libfastertransformer.cc:578] Get input name: input_ids, type: TYPE_UINT32, shape: [-1]
W0910 00:33:36.953370 73 libfastertransformer.cc:578] Get input name: start_id, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953376 73 libfastertransformer.cc:578] Get input name: end_id, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953380 73 libfastertransformer.cc:578] Get input name: input_lengths, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953385 73 libfastertransformer.cc:578] Get input name: request_output_len, type: TYPE_UINT32, shape: [-1]
W0910 00:33:36.953391 73 libfastertransformer.cc:578] Get input name: runtime_top_k, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953398 73 libfastertransformer.cc:578] Get input name: runtime_top_p, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953407 73 libfastertransformer.cc:578] Get input name: beam_search_diversity_rate, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953415 73 libfastertransformer.cc:578] Get input name: temperature, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953422 73 libfastertransformer.cc:578] Get input name: len_penalty, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953430 73 libfastertransformer.cc:578] Get input name: repetition_penalty, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953436 73 libfastertransformer.cc:578] Get input name: random_seed, type: TYPE_UINT64, shape: [1]
W0910 00:33:36.953441 73 libfastertransformer.cc:578] Get input name: is_return_log_probs, type: TYPE_BOOL, shape: [1]
W0910 00:33:36.953447 73 libfastertransformer.cc:578] Get input name: beam_width, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953456 73 libfastertransformer.cc:578] Get input name: bad_words_list, type: TYPE_INT32, shape: [2, -1]
W0910 00:33:36.953464 73 libfastertransformer.cc:578] Get input name: stop_words_list, type: TYPE_INT32, shape: [2, -1]
W0910 00:33:36.953471 73 libfastertransformer.cc:578] Get input name: prompt_learning_task_name_ids, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953482 73 libfastertransformer.cc:620] Get output name: output_ids, type: TYPE_UINT32, shape: [-1, -1]
W0910 00:33:36.953490 73 libfastertransformer.cc:620] Get output name: sequence_length, type: TYPE_UINT32, shape: [-1]
W0910 00:33:36.953497 73 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
W0910 00:33:36.953501 73 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:73 :0:78] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 78) ====
0 0x00000000000143c0 __funlockfile() ???:0
1 0x000000000001ef3a triton::backend::fastertransformer_backend::ModelInstanceState::ModelInstanceState() :0
2 0x000000000001fd42 triton::backend::fastertransformer_backend::ModelInstanceState::Create() :0
3 0x000000000002263c TRITONBACKEND_ModelInstanceInitialize() ???:0
4 0x000000000010ce8a triton::core::TritonModelInstance::CreateInstance() :0
5 0x000000000010e971 triton::core::TritonModelInstance::CreateInstances() :0
6 0x0000000000101a10 triton::core::TritonModel::Create() :0
7 0x00000000001b217a triton::core::ModelRepositoryManager::ModelLifeCycle::CreateModel() :0
8 0x00000000001c0fa1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::core::Status (triton::core::ModelRepositoryManager::ModelLifeCycle::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*), triton::core::ModelRepositoryManager::ModelLifeCycle*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*> > >::_M_run() :0
9 0x00000000000d6de4 std::error_code::default_error_condition() ???:0
10 0x0000000000008609 start_thread() ???:0
11 0x000000000011f163 clone() ???:0
=================================
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] *** Process received signal ***
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] Signal: Segmentation fault (11)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] Signal code: (-6)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] Failing at address: 0x49
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7fc0eeed13c0]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 1] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1ef3a)[0x7fc0e099ef3a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 2] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1fd42)[0x7fc0e099fd42]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 3] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInstanceInitialize+0x38c)[0x7fc0e09a263c]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 4] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10ce8a)[0x7fc0ee172e8a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 5] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10e971)[0x7fc0ee174971]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 6] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101a10)[0x7fc0ee167a10]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 7] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b217a)[0x7fc0ee21817a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 8] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c0fa1)[0x7fc0ee226fa1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 9] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7fc0edcb5de4]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [10] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fc0eeec5609]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fc0ed9a0163]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] *** End of error message ***
Segmentation fault (core dumped)
Do you build the docker by latest main
branch?
Yes, we are using the latest v1.2. It works fine with GPT-J.
Are you able to replicate this error or does the latest v1.2 build work with GPT-NeoX?
Yes, we are using the latest v1.2. It works fine with GPT-J.
Latest main
branch and v1.2
are little different. We have fixed some issues recently. Can you try on main
branch directly?
@byshiue I re-built it using the latest code and main
branch. I use the base image (22.07) but I am facing this error. Any help would be appreciated. Do you happen to know what CUDA driver version the base image has and any details into what version works together?
CUDA Driver:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
Error:
what(): [FT][ERROR] CUDA runtime error: the provided PTX was compiled with an unsupported toolchain. /workspace/build/triton-experiments/build/_deps/repo-ft-src/src/fastertransformer/utils/cuda_utils.h:498
Here are the full logs:
I0913 19:03:15.787037 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1204000000' with size 268435456
I0913 19:03:15.792745 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0913 19:03:15.805911 1 model_repository_manager.cc:1206] loading: fastertransformer:1
I0913 19:03:15.957966 1 libfastertransformer.cc:1478] TRITONBACKEND_Initialize: fastertransformer
I0913 19:03:15.957999 1 libfastertransformer.cc:1488] Triton TRITONBACKEND API version: 1.10
I0913 19:03:15.958004 1 libfastertransformer.cc:1494] 'fastertransformer' TRITONBACKEND API version: 1.10
I0913 19:03:15.958050 1 libfastertransformer.cc:1526] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I0913 19:03:15.959302 1 libfastertransformer.cc:218] Instance group type: KIND_CPU count: 1
I0913 19:03:15.959317 1 libfastertransformer.cc:248] Sequence Batching: disabled
E0913 19:03:15.959324 1 libfastertransformer.cc:324] Invalid configuration argument 'data_type':
I0913 19:03:15.959327 1 libfastertransformer.cc:420] Before Loading Weights:
terminate called after throwing an instance of 'std::runtime_error'
what(): [FT][ERROR] CUDA runtime error: the provided PTX was compiled with an unsupported toolchain. /workspace/build/triton-experiments/build/_deps/repo-ft-src/src/fastertransformer/utils/cuda_utils.h:498
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** Process received signal ***
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal: Aborted (6)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal code: (-6)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f1253a61420]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f125244d00b]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f125242c859]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f1252806911]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f125281238c]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f12528123f7]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f12528126a9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2f6d9)[0x7f124135f6d9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x276b5)[0x7f12413576b5]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x29af2)[0x7f1241359af2]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f124135a071]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101c52)[0x7f1252cf0c52]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b6b9a)[0x7f1252da5b9a]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b7332)[0x7f1252da6332]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x30d780)[0x7f1252efc780]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f125283ede4]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [16] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f1253a55609]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [17] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f1252529133]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** End of error message ***
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:1 :0:10] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid: 10) ====
0 0x0000000000014420 __funlockfile() ???:0
1 0x0000000000022941 abort() ???:0
2 0x000000000009e911 __cxa_throw_bad_array_new_length() ???:0
3 0x00000000000aa38c std::rethrow_exception() ???:0
4 0x00000000000aa3f7 std::terminate() ???:0
5 0x00000000000aa6a9 __cxa_throw() ???:0
6 0x000000000002f6d9 fastertransformer::check<cudaError>() :0
7 0x00000000000276b5 triton::backend::fastertransformer_backend::ModelState::ModelState() :0
8 0x0000000000029af2 triton::backend::fastertransformer_backend::ModelState::Create() :0
9 0x000000000002a071 TRITONBACKEND_ModelInitialize() ???:0
10 0x0000000000101c52 triton::core::TritonModel::Create() :0
11 0x00000000001b6b9a triton::core::ModelRepositoryManager::ModelLifeCycle::CreateModel() :0
12 0x00000000001b7332 std::_Function_handler<void (), triton::core::ModelRepositoryManager::ModelLifeCycle::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*)::{lambda()#1}>::_M_invoke() model_repository_manager.cc:0
13 0x000000000030d780 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run() thread_pool.cc:0
14 0x00000000000d6de4 std::error_code::default_error_condition() ???:0
15 0x0000000000008609 start_thread() ???:0
16 0x000000000011f133 clone() ???:0
=================================
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** Process received signal ***
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal: Segmentation fault (11)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal code: (-6)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Failing at address: 0x1
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f1253a61420]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x213)[0x7f125242c941]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 2] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f1252806911]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f125281238c]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f12528123f7]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f12528126a9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 6] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2f6d9)[0x7f124135f6d9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x276b5)[0x7f12413576b5]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x29af2)[0x7f1241359af2]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f124135a071]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [10] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101c52)[0x7f1252cf0c52]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b6b9a)[0x7f1252da5b9a]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b7332)[0x7f1252da6332]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x30d780)[0x7f1252efc780]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [14] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f125283ede4]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [15] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f1253a55609]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [16] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f1252529133]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** End of error message ***
Can you post the driver by running nvidia-smi
?
Besides, what GPU do you use?
Here is the information from nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A40 On | 00000000:A1:00.0 Off | Off |
| 0% 37C P0 75W / 300W | 12493MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3060 G 25MiB |
| 0 N/A N/A 2500362 C 12465MiB |
+-----------------------------------------------------------------------------+
We are currently running on A40.
Hi, from the requirement here https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html, the docker image 22.07 requires driver 515 or later. Can you try 22.04, which only requires driver 510?
It works with 22.04 and 2 A40s. Closing it now.
@byshiue Getting this error when launching Triton with GPT-NeoX.
I download and convert the weights for GPT-NeoX according to the guide and set the checkpoint path appropriately. Here are my config.ini and config.pbtxt:
config.ini
config.pbtxt
Have you encountered this issue?