GPT-NeoX throws Segmentation Fault (Signal 6)

rtalaricw commented 2 years ago

@byshiue Getting this error when launching Triton with GPT-NeoX.

I0909 00:01:05.466214 1 libfastertransformer.cc:1320] TRITONBACKEND_ModelInstanceInitialize: fastertransformer (device 0)
W0909 00:01:05.534072 1 libfastertransformer.cc:453] Faster transformer model instance is created at GPU '0'
W0909 00:01:05.534113 1 libfastertransformer.cc:459] Model name gptneox_20b
W0909 00:01:05.534128 1 libfastertransformer.cc:578] Get input name: input_ids, type: TYPE_UINT32, shape: [-1]
W0909 00:01:05.534133 1 libfastertransformer.cc:578] Get input name: start_id, type: TYPE_UINT32, shape: [1]
W0909 00:01:05.534136 1 libfastertransformer.cc:578] Get input name: end_id, type: TYPE_UINT32, shape: [1]
W0909 00:01:05.534140 1 libfastertransformer.cc:578] Get input name: input_lengths, type: TYPE_UINT32, shape: [1]
W0909 00:01:05.534144 1 libfastertransformer.cc:578] Get input name: request_output_len, type: TYPE_UINT32, shape: [-1]
W0909 00:01:05.534148 1 libfastertransformer.cc:578] Get input name: runtime_top_k, type: TYPE_UINT32, shape: [1]
W0909 00:01:05.534151 1 libfastertransformer.cc:578] Get input name: runtime_top_p, type: TYPE_FP32, shape: [1]
W0909 00:01:05.534156 1 libfastertransformer.cc:578] Get input name: beam_search_diversity_rate, type: TYPE_FP32, shape: [1]
W0909 00:01:05.534160 1 libfastertransformer.cc:578] Get input name: temperature, type: TYPE_FP32, shape: [1]
W0909 00:01:05.534166 1 libfastertransformer.cc:578] Get input name: len_penalty, type: TYPE_FP32, shape: [1]
W0909 00:01:05.534170 1 libfastertransformer.cc:578] Get input name: repetition_penalty, type: TYPE_FP32, shape: [1]
W0909 00:01:05.534175 1 libfastertransformer.cc:578] Get input name: random_seed, type: TYPE_UINT64, shape: [1]
W0909 00:01:05.534179 1 libfastertransformer.cc:578] Get input name: is_return_log_probs, type: TYPE_BOOL, shape: [1]
W0909 00:01:05.534184 1 libfastertransformer.cc:578] Get input name: beam_width, type: TYPE_UINT32, shape: [1]
W0909 00:01:05.534189 1 libfastertransformer.cc:578] Get input name: bad_words_list, type: TYPE_INT32, shape: [2, -1]
W0909 00:01:05.534193 1 libfastertransformer.cc:578] Get input name: stop_words_list, type: TYPE_INT32, shape: [2, -1]
W0909 00:01:05.534197 1 libfastertransformer.cc:578] Get input name: prompt_learning_task_name_ids, type: TYPE_UINT32, shape: [1]
W0909 00:01:05.534204 1 libfastertransformer.cc:620] Get output name: output_ids, type: TYPE_UINT32, shape: [-1, -1]
W0909 00:01:05.534210 1 libfastertransformer.cc:620] Get output name: sequence_length, type: TYPE_UINT32, shape: [-1]
W0909 00:01:05.534214 1 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
W0909 00:01:05.534219 1 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:1    :0:11] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:     11) ====
 0 0x00000000000143c0 __funlockfile()  ???:0
 1 0x000000000001ef3a triton::backend::fastertransformer_backend::ModelInstanceState::ModelInstanceState()  :0
 2 0x000000000001fd42 triton::backend::fastertransformer_backend::ModelInstanceState::Create()  :0
 3 0x000000000002263c TRITONBACKEND_ModelInstanceInitialize()  ???:0
 4 0x000000000010ce8a triton::core::TritonModelInstance::CreateInstance()  :0
 5 0x000000000010e971 triton::core::TritonModelInstance::CreateInstances()  :0
 6 0x0000000000101a10 triton::core::TritonModel::Create()  :0
 7 0x00000000001b217a triton::core::ModelRepositoryManager::ModelLifeCycle::CreateModel()  :0
 8 0x00000000001c0fa1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::core::Status (triton::core::ModelRepositoryManager::ModelLifeCycle::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*), triton::core::ModelRepositoryManager::ModelLifeCycle*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*> > >::_M_run()  :0
 9 0x00000000000d6de4 std::error_code::default_error_condition()  ???:0
10 0x0000000000008609 start_thread()  ???:0
11 0x000000000011f163 clone()  ???:0
=================================
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] *** Process received signal ***
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] Signal: Segmentation fault (11)
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] Signal code:  (-6)
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] Failing at address: 0x1
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f3d34fa43c0]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 1] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1ef3a)[0x7f3d22a70f3a]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 2] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1fd42)[0x7f3d22a71d42]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 3] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInstanceInitialize+0x38c)[0x7f3d22a7463c]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 4] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10ce8a)[0x7f3d34245e8a]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 5] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10e971)[0x7f3d34247971]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 6] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101a10)[0x7f3d3423aa10]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 7] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b217a)[0x7f3d342eb17a]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 8] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c0fa1)[0x7f3d342f9fa1]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [ 9] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f3d33d88de4]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [10] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f3d34f98609]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f3d33a73163]
[fastertransformer-triton-predictor-default-00001-deploymenknpmt:00001] *** End of error message ***

I download and convert the weights for GPT-NeoX according to the guide and set the checkpoint path appropriately. Here are my config.ini and config.pbtxt:

config.ini

[gptneox]
model_name=gptneox_20B
head_num=64
size_per_head=96
vocab_size=50432
num_layer=44
rotary_embedding=24
start_id=0
end_id=2
inter_size=24576
use_gptj_residual=1
weight_data_type=fp32

config.pbtxt

# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#  * Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#  * Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#  * Neither the name of NVIDIA CORPORATION nor the names of its
#    contributors may be used to endorse or promote products derived
#    from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

name: "fastertransformer"
backend: "fastertransformer"
default_model_filename: "gptneox_20b"
max_batch_size: 1024

model_transaction_policy {
decoupled: False
}

input [
{
    name: "input_ids"
    data_type: TYPE_UINT32
    dims: [ -1 ]
},
{
    name: "start_id"
    data_type: TYPE_UINT32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "end_id"
    data_type: TYPE_UINT32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "input_lengths"
    data_type: TYPE_UINT32
    dims: [ 1 ]
    reshape: { shape: [ ] }
},
{
    name: "request_output_len"
    data_type: TYPE_UINT32
    dims: [ -1 ]
},
{
    name: "runtime_top_k"
    data_type: TYPE_UINT32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "runtime_top_p"
    data_type: TYPE_FP32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "beam_search_diversity_rate"
    data_type: TYPE_FP32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "temperature"
    data_type: TYPE_FP32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "len_penalty"
    data_type: TYPE_FP32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "repetition_penalty"
    data_type: TYPE_FP32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "random_seed"
    data_type: TYPE_UINT64
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "is_return_log_probs"
    data_type: TYPE_BOOL
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "beam_width"
    data_type: TYPE_UINT32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
},
{
    name: "bad_words_list"
    data_type: TYPE_INT32
    dims: [ 2, -1 ]
    optional: true
},
{
    name: "stop_words_list"
    data_type: TYPE_INT32
    dims: [ 2, -1 ]
    optional: true
},
{
    name: "prompt_learning_task_name_ids"
    data_type: TYPE_UINT32
    dims: [ 1 ]
    reshape: { shape: [ ] }
    optional: true
}
]
output [
{
    name: "output_ids"
    data_type: TYPE_UINT32
    dims: [ -1, -1 ]
},
{
    name: "sequence_length"
    data_type: TYPE_UINT32
    dims: [ -1 ]
},
{
    name: "cum_log_probs"
    data_type: TYPE_FP32
    dims: [ -1 ]
},
{
    name: "output_log_probs"
    data_type: TYPE_FP32
    dims: [ -1, -1 ]
}
]
parameters {
key: "tensor_para_size"
value: {
    string_value: "2"
}
}
parameters {
key: "pipeline_para_size"
value: {
    string_value: "1"
}
}
parameters {
key: "data_type"
value: {
    string_value: "fp32"
}
}
parameters {
key: "model_type"
value: {
    string_value: "GPT-NeoX"
}
}
parameters {
key: "model_checkpoint_path"
value: {
    string_value: "/mnt/pvc/triton-model-store/fastertransformer/1/"
}
}
parameters {
key: "enable_custom_all_reduce"
value: {
    string_value: "0"
}
}

Have you encountered this issue?

byshiue commented 2 years ago

Please provide the full log when you launch the server. You can add FT_LOG_LEVEL=DEBUG when you launch server to print all debug messages, too.

rtalaricw commented 2 years ago

@byshiue Like this: /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store FT_LOG_LEVEL=DEBUG?

rtalaricw commented 2 years ago

Here is the full log with --log-verbose 10:

I0909 18:03:03.597110 36 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f5396000000' with size 268435456
I0909 18:03:03.602194 36 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0909 18:03:03.602207 36 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
I0909 18:03:03.735177 36 model_config_utils.cc:645] Server side auto-completed config: name: "fastertransformer"
max_batch_size: 1024
input {
  name: "input_ids"
  data_type: TYPE_UINT32
  dims: -1
}
input {
  name: "start_id"
  data_type: TYPE_UINT32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "end_id"
  data_type: TYPE_UINT32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "input_lengths"
  data_type: TYPE_UINT32
  dims: 1
  reshape {
  }
}
input {
  name: "request_output_len"
  data_type: TYPE_UINT32
  dims: -1
}
input {
  name: "runtime_top_k"
  data_type: TYPE_UINT32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "runtime_top_p"
  data_type: TYPE_FP32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "beam_search_diversity_rate"
  data_type: TYPE_FP32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "temperature"
  data_type: TYPE_FP32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "len_penalty"
  data_type: TYPE_FP32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "repetition_penalty"
  data_type: TYPE_FP32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "random_seed"
  data_type: TYPE_UINT64
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "is_return_log_probs"
  data_type: TYPE_BOOL
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "beam_width"
  data_type: TYPE_UINT32
  dims: 1
  reshape {
  }
  optional: true
}
input {
  name: "bad_words_list"
  data_type: TYPE_INT32
  dims: 2
  dims: -1
  optional: true
}
input {
  name: "stop_words_list"
  data_type: TYPE_INT32
  dims: 2
  dims: -1
  optional: true
}
input {
  name: "prompt_learning_task_name_ids"
  data_type: TYPE_UINT32
  dims: 1
  reshape {
  }
  optional: true
}
output {
  name: "output_ids"
  data_type: TYPE_UINT32
  dims: -1
  dims: -1
}
output {
  name: "sequence_length"
  data_type: TYPE_UINT32
  dims: -1
}
output {
  name: "cum_log_probs"
  data_type: TYPE_FP32
  dims: -1
}
output {
  name: "output_log_probs"
  data_type: TYPE_FP32
  dims: -1
  dims: -1
}
default_model_filename: "gptneox_20b"
parameters {
  key: "data_type"
  value {
    string_value: "fp32"
  }
}
parameters {
  key: "enable_custom_all_reduce"
  value {
    string_value: "0"
  }
}
parameters {
  key: "model_checkpoint_path"
  value {
    string_value: "/mnt/pvc/triton-model-store/fastertransformer/1/"
  }
}
parameters {
  key: "model_type"
  value {
    string_value: "GPT-NeoX"
  }
}
parameters {
  key: "pipeline_para_size"
  value {
    string_value: "1"
  }
}
parameters {
  key: "tensor_para_size"
  value {
    string_value: "2"
  }
}
backend: "fastertransformer"
model_transaction_policy {
}

I0909 18:03:03.738451 36 model_repository_manager.cc:898] AsyncLoad() 'fastertransformer'
I0909 18:03:03.738517 36 model_repository_manager.cc:1136] TriggerNextAction() 'fastertransformer' version 1: 1
I0909 18:03:03.738528 36 model_repository_manager.cc:1172] Load() 'fastertransformer' version 1
I0909 18:03:03.738534 36 model_repository_manager.cc:1191] loading: fastertransformer:1
I0909 18:03:03.838733 36 model_repository_manager.cc:1249] CreateModel() 'fastertransformer' version 1
I0909 18:03:03.838849 36 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4
I0909 18:03:03.838879 36 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so
I0909 18:03:04.188234 36 libfastertransformer.cc:1226] TRITONBACKEND_Initialize: fastertransformer
I0909 18:03:04.188269 36 libfastertransformer.cc:1236] Triton TRITONBACKEND API version: 1.9
I0909 18:03:04.188274 36 libfastertransformer.cc:1242] 'fastertransformer' TRITONBACKEND API version: 1.9
I0909 18:03:04.188324 36 libfastertransformer.cc:1274] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I0909 18:03:04.189340 36 model_config_utils.cc:1597] ModelConfig 64-bit fields:
I0909 18:03:04.189353 36 model_config_utils.cc:1599]    ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0909 18:03:04.189358 36 model_config_utils.cc:1599]    ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0909 18:03:04.189361 36 model_config_utils.cc:1599]    ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0909 18:03:04.189365 36 model_config_utils.cc:1599]    ModelConfig::ensemble_scheduling::step::model_version
I0909 18:03:04.189370 36 model_config_utils.cc:1599]    ModelConfig::input::dims
I0909 18:03:04.189373 36 model_config_utils.cc:1599]    ModelConfig::input::reshape::shape
I0909 18:03:04.189377 36 model_config_utils.cc:1599]    ModelConfig::instance_group::secondary_devices::device_id
I0909 18:03:04.189382 36 model_config_utils.cc:1599]    ModelConfig::model_warmup::inputs::value::dims
I0909 18:03:04.189387 36 model_config_utils.cc:1599]    ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0909 18:03:04.189391 36 model_config_utils.cc:1599]    ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0909 18:03:04.189396 36 model_config_utils.cc:1599]    ModelConfig::output::dims
I0909 18:03:04.189402 36 model_config_utils.cc:1599]    ModelConfig::output::reshape::shape
I0909 18:03:04.189407 36 model_config_utils.cc:1599]    ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0909 18:03:04.189412 36 model_config_utils.cc:1599]    ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0909 18:03:04.189417 36 model_config_utils.cc:1599]    ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0909 18:03:04.189422 36 model_config_utils.cc:1599]    ModelConfig::sequence_batching::state::dims
I0909 18:03:04.189426 36 model_config_utils.cc:1599]    ModelConfig::sequence_batching::state::initial_state::dims
I0909 18:03:04.189430 36 model_config_utils.cc:1599]    ModelConfig::version_policy::specific::versions
W0909 18:03:04.189708 36 libfastertransformer.cc:149] model configuration:
{
    "name": "fastertransformer",
    "platform": "",
    "backend": "fastertransformer",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1024,
    "input": [
        {
            "name": "input_ids",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        },
        {
            "name": "start_id",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "end_id",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "input_lengths",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        },
        {
            "name": "request_output_len",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        },
        {
            "name": "runtime_top_k",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "runtime_top_p",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "beam_search_diversity_rate",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "temperature",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "len_penalty",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "repetition_penalty",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "random_seed",
            "data_type": "TYPE_UINT64",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "is_return_log_probs",
            "data_type": "TYPE_BOOL",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "beam_width",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "bad_words_list",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                2,
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "stop_words_list",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                2,
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "prompt_learning_task_name_ids",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        }
    ],
    "output": [
        {
            "name": "output_ids",
            "data_type": "TYPE_UINT32",
            "dims": [
                -1,
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "sequence_length",
            "data_type": "TYPE_UINT32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "cum_log_probs",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "output_log_probs",
            "data_type": "TYPE_FP32",
            "dims": [
                -1,
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "fastertransformer",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0,
                1
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "gptneox_20b",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "data_type": {
            "string_value": "fp32"
        },
        "enable_custom_all_reduce": {
            "string_value": "0"
        },
        "model_type": {
            "string_value": "GPT-NeoX"
        },
        "model_checkpoint_path": {
            "string_value": "/mnt/pvc/triton-model-store/fastertransformer/1/"
        },
        "pipeline_para_size": {
            "string_value": "1"
        },
        "tensor_para_size": {
            "string_value": "2"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0909 18:03:04.191573 36 libfastertransformer.cc:1320] TRITONBACKEND_ModelInstanceInitialize: fastertransformer (device 0)
I0909 18:03:04.192350 36 backend_model_instance.cc:105] Creating instance fastertransformer on GPU 0 (8.6) using artifact 'gptneox_20b'
W0909 18:03:04.257650 36 libfastertransformer.cc:453] Faster transformer model instance is created at GPU '0'
W0909 18:03:04.257706 36 libfastertransformer.cc:459] Model name gptneox_20b
W0909 18:03:04.257726 36 libfastertransformer.cc:578] Get input name: input_ids, type: TYPE_UINT32, shape: [-1]
W0909 18:03:04.257735 36 libfastertransformer.cc:578] Get input name: start_id, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257740 36 libfastertransformer.cc:578] Get input name: end_id, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257745 36 libfastertransformer.cc:578] Get input name: input_lengths, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257750 36 libfastertransformer.cc:578] Get input name: request_output_len, type: TYPE_UINT32, shape: [-1]
W0909 18:03:04.257755 36 libfastertransformer.cc:578] Get input name: runtime_top_k, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257758 36 libfastertransformer.cc:578] Get input name: runtime_top_p, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257763 36 libfastertransformer.cc:578] Get input name: beam_search_diversity_rate, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257768 36 libfastertransformer.cc:578] Get input name: temperature, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257774 36 libfastertransformer.cc:578] Get input name: len_penalty, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257779 36 libfastertransformer.cc:578] Get input name: repetition_penalty, type: TYPE_FP32, shape: [1]
W0909 18:03:04.257783 36 libfastertransformer.cc:578] Get input name: random_seed, type: TYPE_UINT64, shape: [1]
W0909 18:03:04.257789 36 libfastertransformer.cc:578] Get input name: is_return_log_probs, type: TYPE_BOOL, shape: [1]
W0909 18:03:04.257793 36 libfastertransformer.cc:578] Get input name: beam_width, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257799 36 libfastertransformer.cc:578] Get input name: bad_words_list, type: TYPE_INT32, shape: [2, -1]
W0909 18:03:04.257803 36 libfastertransformer.cc:578] Get input name: stop_words_list, type: TYPE_INT32, shape: [2, -1]
W0909 18:03:04.257807 36 libfastertransformer.cc:578] Get input name: prompt_learning_task_name_ids, type: TYPE_UINT32, shape: [1]
W0909 18:03:04.257814 36 libfastertransformer.cc:620] Get output name: output_ids, type: TYPE_UINT32, shape: [-1, -1]
W0909 18:03:04.257819 36 libfastertransformer.cc:620] Get output name: sequence_length, type: TYPE_UINT32, shape: [-1]
W0909 18:03:04.257823 36 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
W0909 18:03:04.257827 36 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:36   :0:40] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:     40) ====
 0 0x00000000000143c0 __funlockfile()  ???:0
 1 0x000000000001ef3a triton::backend::fastertransformer_backend::ModelInstanceState::ModelInstanceState()  :0
 2 0x000000000001fd42 triton::backend::fastertransformer_backend::ModelInstanceState::Create()  :0
 3 0x000000000002263c TRITONBACKEND_ModelInstanceInitialize()  ???:0
 4 0x000000000010ce8a triton::core::TritonModelInstance::CreateInstance()  :0
 5 0x000000000010e971 triton::core::TritonModelInstance::CreateInstances()  :0
 6 0x0000000000101a10 triton::core::TritonModel::Create()  :0
 7 0x00000000001b217a triton::core::ModelRepositoryManager::ModelLifeCycle::CreateModel()  :0
 8 0x00000000001c0fa1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::core::Status (triton::core::ModelRepositoryManager::ModelLifeCycle::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*), triton::core::ModelRepositoryManager::ModelLifeCycle*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*> > >::_M_run()  :0
 9 0x00000000000d6de4 std::error_code::default_error_condition()  ???:0
10 0x0000000000008609 start_thread()  ???:0
11 0x000000000011f163 clone()  ???:0
=================================
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] *** Process received signal ***
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] Signal: Segmentation fault (11)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] Signal code:  (-6)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] Failing at address: 0x24
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7f53ebeeb3c0]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 1] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1ef3a)[0x7f53e19bbf3a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 2] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1fd42)[0x7f53e19bcd42]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 3] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInstanceInitialize+0x38c)[0x7f53e19bf63c]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 4] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10ce8a)[0x7f53eb18ee8a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 5] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10e971)[0x7f53eb190971]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 6] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101a10)[0x7f53eb183a10]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 7] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b217a)[0x7f53eb23417a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 8] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c0fa1)[0x7f53eb242fa1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [ 9] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f53eacd1de4]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [10] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f53ebedf609]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f53ea9bc163]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00036] *** End of error message ***
Segmentation fault (core dumped)

byshiue commented 2 years ago

@byshiue Like this: /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store FT_LOG_LEVEL=DEBUG?

No. Please run by

FT_LOG_LEVEL=DEBUG /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store

Also, I want to ensure you use the latest main branch.

rtalaricw commented 2 years ago

I am. I just recently pulled v1.2 and built a new image.

rtalaricw commented 2 years ago

@byshiue Here are logs with (they are same as above):

FT_LOG_LEVEL=DEBUG /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store

root@fastertransformer-triton-predictor-default-00001-deploymen2j9nd:/opt/tritonserver# FT_LOG_LEVEL=DEBUG /opt/tritonserver/bin/tritonserver --model-repository=/mnt/pvc/triton-model-store
I0910 00:33:36.186877 73 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc09e000000' with size 268435456
I0910 00:33:36.191759 73 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0910 00:33:36.191772 73 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
I0910 00:33:36.527831 73 model_repository_manager.cc:1191] loading: fastertransformer:1
I0910 00:33:36.881606 73 libfastertransformer.cc:1226] TRITONBACKEND_Initialize: fastertransformer
I0910 00:33:36.881642 73 libfastertransformer.cc:1236] Triton TRITONBACKEND API version: 1.9
I0910 00:33:36.881650 73 libfastertransformer.cc:1242] 'fastertransformer' TRITONBACKEND API version: 1.9
I0910 00:33:36.881702 73 libfastertransformer.cc:1274] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
W0910 00:33:36.882960 73 libfastertransformer.cc:149] model configuration:
{
    "name": "fastertransformer",
    "platform": "",
    "backend": "fastertransformer",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1024,
    "input": [
        {
            "name": "input_ids",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        },
        {
            "name": "start_id",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "end_id",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "input_lengths",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        },
        {
            "name": "request_output_len",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        },
        {
            "name": "runtime_top_k",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "runtime_top_p",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "beam_search_diversity_rate",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "temperature",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "len_penalty",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "repetition_penalty",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "random_seed",
            "data_type": "TYPE_UINT64",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "is_return_log_probs",
            "data_type": "TYPE_BOOL",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "beam_width",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "bad_words_list",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                2,
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "stop_words_list",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                2,
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        },
        {
            "name": "prompt_learning_task_name_ids",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": true
        }
    ],
    "output": [
        {
            "name": "output_ids",
            "data_type": "TYPE_UINT32",
            "dims": [
                -1,
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "sequence_length",
            "data_type": "TYPE_UINT32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "cum_log_probs",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "output_log_probs",
            "data_type": "TYPE_FP32",
            "dims": [
                -1,
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "fastertransformer",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0,
                1
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "gptneox_20b",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "tensor_para_size": {
            "string_value": "2"
        },
        "data_type": {
            "string_value": "fp32"
        },
        "enable_custom_all_reduce": {
            "string_value": "0"
        },
        "model_type": {
            "string_value": "GPT-NeoX"
        },
        "model_checkpoint_path": {
            "string_value": "/mnt/pvc/triton-model-store/fastertransformer/1/"
        },
        "pipeline_para_size": {
            "string_value": "1"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0910 00:33:36.884884 73 libfastertransformer.cc:1320] TRITONBACKEND_ModelInstanceInitialize: fastertransformer (device 0)
W0910 00:33:36.953299 73 libfastertransformer.cc:453] Faster transformer model instance is created at GPU '0'
W0910 00:33:36.953344 73 libfastertransformer.cc:459] Model name gptneox_20b
W0910 00:33:36.953364 73 libfastertransformer.cc:578] Get input name: input_ids, type: TYPE_UINT32, shape: [-1]
W0910 00:33:36.953370 73 libfastertransformer.cc:578] Get input name: start_id, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953376 73 libfastertransformer.cc:578] Get input name: end_id, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953380 73 libfastertransformer.cc:578] Get input name: input_lengths, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953385 73 libfastertransformer.cc:578] Get input name: request_output_len, type: TYPE_UINT32, shape: [-1]
W0910 00:33:36.953391 73 libfastertransformer.cc:578] Get input name: runtime_top_k, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953398 73 libfastertransformer.cc:578] Get input name: runtime_top_p, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953407 73 libfastertransformer.cc:578] Get input name: beam_search_diversity_rate, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953415 73 libfastertransformer.cc:578] Get input name: temperature, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953422 73 libfastertransformer.cc:578] Get input name: len_penalty, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953430 73 libfastertransformer.cc:578] Get input name: repetition_penalty, type: TYPE_FP32, shape: [1]
W0910 00:33:36.953436 73 libfastertransformer.cc:578] Get input name: random_seed, type: TYPE_UINT64, shape: [1]
W0910 00:33:36.953441 73 libfastertransformer.cc:578] Get input name: is_return_log_probs, type: TYPE_BOOL, shape: [1]
W0910 00:33:36.953447 73 libfastertransformer.cc:578] Get input name: beam_width, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953456 73 libfastertransformer.cc:578] Get input name: bad_words_list, type: TYPE_INT32, shape: [2, -1]
W0910 00:33:36.953464 73 libfastertransformer.cc:578] Get input name: stop_words_list, type: TYPE_INT32, shape: [2, -1]
W0910 00:33:36.953471 73 libfastertransformer.cc:578] Get input name: prompt_learning_task_name_ids, type: TYPE_UINT32, shape: [1]
W0910 00:33:36.953482 73 libfastertransformer.cc:620] Get output name: output_ids, type: TYPE_UINT32, shape: [-1, -1]
W0910 00:33:36.953490 73 libfastertransformer.cc:620] Get output name: sequence_length, type: TYPE_UINT32, shape: [-1]
W0910 00:33:36.953497 73 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
W0910 00:33:36.953501 73 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:73   :0:78] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:     78) ====
 0 0x00000000000143c0 __funlockfile()  ???:0
 1 0x000000000001ef3a triton::backend::fastertransformer_backend::ModelInstanceState::ModelInstanceState()  :0
 2 0x000000000001fd42 triton::backend::fastertransformer_backend::ModelInstanceState::Create()  :0
 3 0x000000000002263c TRITONBACKEND_ModelInstanceInitialize()  ???:0
 4 0x000000000010ce8a triton::core::TritonModelInstance::CreateInstance()  :0
 5 0x000000000010e971 triton::core::TritonModelInstance::CreateInstances()  :0
 6 0x0000000000101a10 triton::core::TritonModel::Create()  :0
 7 0x00000000001b217a triton::core::ModelRepositoryManager::ModelLifeCycle::CreateModel()  :0
 8 0x00000000001c0fa1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::core::Status (triton::core::ModelRepositoryManager::ModelLifeCycle::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*), triton::core::ModelRepositoryManager::ModelLifeCycle*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*> > >::_M_run()  :0
 9 0x00000000000d6de4 std::error_code::default_error_condition()  ???:0
10 0x0000000000008609 start_thread()  ???:0
11 0x000000000011f163 clone()  ???:0
=================================
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] *** Process received signal ***
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] Signal: Segmentation fault (11)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] Signal code:  (-6)
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] Failing at address: 0x49
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7fc0eeed13c0]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 1] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1ef3a)[0x7fc0e099ef3a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 2] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x1fd42)[0x7fc0e099fd42]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 3] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInstanceInitialize+0x38c)[0x7fc0e09a263c]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 4] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10ce8a)[0x7fc0ee172e8a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 5] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10e971)[0x7fc0ee174971]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 6] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101a10)[0x7fc0ee167a10]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 7] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b217a)[0x7fc0ee21817a]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 8] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1c0fa1)[0x7fc0ee226fa1]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [ 9] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7fc0edcb5de4]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [10] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fc0eeec5609]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fc0ed9a0163]
[fastertransformer-triton-predictor-default-00001-deploymen2j9nd:00073] *** End of error message ***
Segmentation fault (core dumped)

byshiue commented 2 years ago

Do you build the docker by latest main branch?

rtalaricw commented 2 years ago

Yes, we are using the latest v1.2. It works fine with GPT-J.

rtalaricw commented 2 years ago

Are you able to replicate this error or does the latest v1.2 build work with GPT-NeoX?

byshiue commented 2 years ago

Yes, we are using the latest v1.2. It works fine with GPT-J.

Latest main branch and v1.2 are little different. We have fixed some issues recently. Can you try on main branch directly?

rtalaricw commented 2 years ago

@byshiue I re-built it using the latest code and main branch. I use the base image (22.07) but I am facing this error. Any help would be appreciated. Do you happen to know what CUDA driver version the base image has and any details into what version works together?

CUDA Driver:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Error:

what():  [FT][ERROR] CUDA runtime error: the provided PTX was compiled with an unsupported toolchain. /workspace/build/triton-experiments/build/_deps/repo-ft-src/src/fastertransformer/utils/cuda_utils.h:498

Here are the full logs:

I0913 19:03:15.787037 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1204000000' with size 268435456
I0913 19:03:15.792745 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0913 19:03:15.805911 1 model_repository_manager.cc:1206] loading: fastertransformer:1
I0913 19:03:15.957966 1 libfastertransformer.cc:1478] TRITONBACKEND_Initialize: fastertransformer
I0913 19:03:15.957999 1 libfastertransformer.cc:1488] Triton TRITONBACKEND API version: 1.10
I0913 19:03:15.958004 1 libfastertransformer.cc:1494] 'fastertransformer' TRITONBACKEND API version: 1.10
I0913 19:03:15.958050 1 libfastertransformer.cc:1526] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I0913 19:03:15.959302 1 libfastertransformer.cc:218] Instance group type: KIND_CPU count: 1
I0913 19:03:15.959317 1 libfastertransformer.cc:248] Sequence Batching: disabled
E0913 19:03:15.959324 1 libfastertransformer.cc:324] Invalid configuration argument 'data_type': 
I0913 19:03:15.959327 1 libfastertransformer.cc:420] Before Loading Weights:
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: the provided PTX was compiled with an unsupported toolchain. /workspace/build/triton-experiments/build/_deps/repo-ft-src/src/fastertransformer/utils/cuda_utils.h:498 

[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** Process received signal ***
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal: Aborted (6)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal code:  (-6)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f1253a61420]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f125244d00b]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f125242c859]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f1252806911]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f125281238c]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f12528123f7]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f12528126a9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2f6d9)[0x7f124135f6d9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x276b5)[0x7f12413576b5]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x29af2)[0x7f1241359af2]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [10] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f124135a071]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101c52)[0x7f1252cf0c52]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b6b9a)[0x7f1252da5b9a]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b7332)[0x7f1252da6332]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x30d780)[0x7f1252efc780]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f125283ede4]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [16] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f1253a55609]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [17] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f1252529133]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** End of error message ***
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:1    :0:10] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:     10) ====
 0 0x0000000000014420 __funlockfile()  ???:0
 1 0x0000000000022941 abort()  ???:0
 2 0x000000000009e911 __cxa_throw_bad_array_new_length()  ???:0
 3 0x00000000000aa38c std::rethrow_exception()  ???:0
 4 0x00000000000aa3f7 std::terminate()  ???:0
 5 0x00000000000aa6a9 __cxa_throw()  ???:0
 6 0x000000000002f6d9 fastertransformer::check<cudaError>()  :0
 7 0x00000000000276b5 triton::backend::fastertransformer_backend::ModelState::ModelState()  :0
 8 0x0000000000029af2 triton::backend::fastertransformer_backend::ModelState::Create()  :0
 9 0x000000000002a071 TRITONBACKEND_ModelInitialize()  ???:0
10 0x0000000000101c52 triton::core::TritonModel::Create()  :0
11 0x00000000001b6b9a triton::core::ModelRepositoryManager::ModelLifeCycle::CreateModel()  :0
12 0x00000000001b7332 std::_Function_handler<void (), triton::core::ModelRepositoryManager::ModelLifeCycle::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelRepositoryManager::ModelLifeCycle::ModelInfo*)::{lambda()#1}>::_M_invoke()  model_repository_manager.cc:0
13 0x000000000030d780 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run()  thread_pool.cc:0
14 0x00000000000d6de4 std::error_code::default_error_condition()  ???:0
15 0x0000000000008609 start_thread()  ???:0
16 0x000000000011f133 clone()  ???:0
=================================
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** Process received signal ***
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal: Segmentation fault (11)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Signal code:  (-6)
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] Failing at address: 0x1
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f1253a61420]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x213)[0x7f125242c941]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 2] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7f1252806911]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7f125281238c]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7f12528123f7]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7f12528126a9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 6] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x2f6d9)[0x7f124135f6d9]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 7] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x276b5)[0x7f12413576b5]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 8] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(+0x29af2)[0x7f1241359af2]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [ 9] /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so(TRITONBACKEND_ModelInitialize+0x341)[0x7f124135a071]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [10] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x101c52)[0x7f1252cf0c52]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [11] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b6b9a)[0x7f1252da5b9a]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [12] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b7332)[0x7f1252da6332]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [13] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x30d780)[0x7f1252efc780]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [14] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f125283ede4]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [15] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f1253a55609]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] [16] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f1252529133]
[fastertransformer-tr7134a4799657a850f4c0843eb76bdf20-deplojd92h:00001] *** End of error message ***

byshiue commented 2 years ago

Can you post the driver by running nvidia-smi?

Besides, what GPU do you use?

rtalaricw commented 2 years ago

Here is the information from nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A40          On   | 00000000:A1:00.0 Off |                  Off |
|  0%   37C    P0    75W / 300W |  12493MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3060      G                                      25MiB |
|    0   N/A  N/A   2500362      C                                   12465MiB |
+-----------------------------------------------------------------------------+

We are currently running on A40.

byshiue commented 2 years ago

Hi, from the requirement here https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html, the docker image 22.07 requires driver 515 or later. Can you try 22.04, which only requires driver 510?

rtalaricw commented 2 years ago

It works with 22.04 and 2 A40s. Closing it now.

triton-inference-server / fastertransformer_backend

GPT-NeoX throws Segmentation Fault (Signal 6) #43