Closed yanncaniouoracle closed 1 month ago
Python Version: CPython 3.10.12 Operating System: Linux 6.5.0-1023-oracle-64k CPU Architecture: aarch64 Driver Version: 550.67 CUDA Version: 12.4 GPU Architecture NVIDIA GH200 480GB
@juney-nvidia @ncomly-nvidia
examples
Steps to reproduce the behaviour:
Build the container via Docker. Current main branch. Docker file uses:
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver ARG BASE_TAG=24.03-py3
Commands to reproduce:
cd tensorrtllm_backend git lfs install git submodule update --init --recursive DOCKER_BUILDKIT=1 docker build -t triton_trt_llm --build-arg TORCH_INSTALL_TYPE="src_non_cxx11_abi" -f dockerfile/Dockerfile.trt_llm_backend .
Successful container build that allows user to build model engines.
RUN cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root="/usr/local/tensorrt" -i -c && cd .. fails with the following error:
RUN cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root="/usr/local/tensorrt" -i -c && cd ..
2206.4 [100%] Linking CXX shared module bindings.cpython-310-aarch64-linux-gnu.so 2207.7 lto-wrapper: warning: using serial compilation of 18 LTRANS jobs 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans11.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<unsigned long, std::string>::execute<pybind11::class_<tensorrt_llm::executor::Response>, pybind11::arg, pybind11::arg, 0>(pybind11::class_<tensorrt_llm::executor::Response>&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::value_and_holder&, unsigned long, std::string)#1}, void, pybind11::detail::value_and_holder&, unsigned long, std::string, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg>(pybind11::class_<tensorrt_llm::executor::Response>&&, void (*)(pybind11::detail::value_and_holder&, unsigned long, std::string), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)': 2220.0 <artificial>:(.text+0x4d70): undefined reference to `tensorrt_llm::executor::Response::Response(unsigned long, std::string)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x4dcc): undefined reference to `tensorrt_llm::executor::Response::Response(unsigned long, std::string)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans11.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<bool, std::string>::execute<pybind11::class_<tensorrt_llm::executor::OrchestratorConfig>, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::OrchestratorConfig>&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, bool, std::string)#1}, void, pybind11::detail::value_and_holder&, bool, std::string, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg_v, pybind11::arg_v>(pybind11::class_<tensorrt_llm::executor::OrchestratorConfig>&&, void (*)(pybind11::detail::value_and_holder&, bool, std::string), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)': 2220.0 <artificial>:(.text+0x4fe0): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::OrchestratorConfig(bool, std::string, std::shared_ptr<tensorrt_llm::mpi::MpiComm>)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5074): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::OrchestratorConfig(bool, std::string, std::shared_ptr<tensorrt_llm::mpi::MpiComm>)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans12.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&>::execute<pybind11::class_<tensorrt_llm::pybind::executor::Executor>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, 0>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::value_and_holder&, std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)#1}, void, pybind11::detail::value_and_holder&, std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&&, void (*)(pybind11::detail::value_and_holder&, std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)': 2220.0 <artificial>:(.text+0x3e24): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x3fe4): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans12.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&>::execute<pybind11::class_<tensorrt_llm::pybind::executor::Executor>, pybind11::arg, pybind11::arg, pybind11::arg, 0>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)#1}, void, pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&&, void (*)(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)': 2220.0 <artificial>:(.text+0x4594): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x462c): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans13.ltrans.o: in function `void pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode> >::call_impl<void, pybind11::detail::initimpl::constructor<int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode> >::execute<pybind11::class_<tensorrt_llm::executor::ExecutorConfig>, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::ExecutorConfig>&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode>)#1}&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul, pybind11::detail::void_type>(pybind11::detail::initimpl::constructor<int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode> >::execute<pybind11::class_<tensorrt_llm::executor::ExecutorConfig>, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::ExecutorConfig>&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode>)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul>, pybind11::detail::void_type&&) && [clone .constprop.0]': 2220.0 <artificial>:(.text+0x69d8): undefined reference to `tensorrt_llm::executor::ExecutorConfig::ExecutorConfig(int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, std::optional<tensorrt_llm::executor::PeftCacheConfig> const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode>, float)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans14.ltrans.o: in function `void pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long> >::call_impl<void, pybind11::detail::initimpl::constructor<std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long> >::execute<pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long>)#1}&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, pybind11::detail::void_type>(pybind11::detail::initimpl::constructor<std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long> >::execute<pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long>)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul>, pybind11::detail::void_type&&) && [clone .constprop.0]': 2220.0 <artificial>:(.text+0x648): undefined reference to `tensorrt_llm::batch_manager::GptManager::GptManager(std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest> > > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::batch_manager::NamedTensor, std::allocator<tensorrt_llm::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long>, std::optional<int>, bool)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans14.ltrans.o: in function `void pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string> >::call_impl<void, pybind11::detail::initimpl::constructor<std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string> >::execute<pybind11::class_<tensorrt_llm::executor::Request>, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::Request>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string>)#1}&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul, 14ul, pybind11::detail::void_type>(pybind11::detail::initimpl::constructor<std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string> >::execute<pybind11::class_<tensorrt_llm::executor::Request>, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::Request>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string>)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul, 14ul>, pybind11::detail::void_type&&) && [clone .constprop.0]': 2220.0 <artificial>:(.text+0x1adc): undefined reference to `tensorrt_llm::executor::Request::Request(std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string>)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans8.ltrans.o: in function `tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)': 2220.0 <artificial>:(.text+0x532c): undefined reference to `tensorrt_llm::executor::Request::getBadWords() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5330): undefined reference to `tensorrt_llm::executor::Request::setBadWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x533c): undefined reference to `tensorrt_llm::executor::Request::getBadWords() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5344): undefined reference to `tensorrt_llm::executor::Request::setBadWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5358): undefined reference to `tensorrt_llm::executor::Request::getStopWords() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x535c): undefined reference to `tensorrt_llm::executor::Request::setStopWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5368): undefined reference to `tensorrt_llm::executor::Request::getStopWords() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5370): undefined reference to `tensorrt_llm::executor::Request::setStopWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5414): undefined reference to `tensorrt_llm::executor::Request::setLogitsPostProcessorName(std::string const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5418): undefined reference to `tensorrt_llm::executor::Request::getLogitsPostProcessorName() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5424): undefined reference to `tensorrt_llm::executor::Request::getLogitsPostProcessorName() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x542c): undefined reference to `tensorrt_llm::executor::Request::setLogitsPostProcessorName(std::string const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5a60): undefined reference to `tensorrt_llm::executor::Response::getErrorMsg() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x5a6c): undefined reference to `tensorrt_llm::executor::Response::getErrorMsg() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x671c): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::setWorkerExecutablePath(std::string const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x6720): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::getWorkerExecutablePath() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x672c): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::getWorkerExecutablePath() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x6734): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::setWorkerExecutablePath(std::string const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x788c): undefined reference to `tensorrt_llm::executor::ExecutorConfig::setLogitsPostProcessorMap(std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x7890): undefined reference to `tensorrt_llm::executor::ExecutorConfig::getLogitsPostProcessorMap() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x789c): undefined reference to `tensorrt_llm::executor::ExecutorConfig::getLogitsPostProcessorMap() const' 2220.0 /usr/bin/ld: <artificial>:(.text+0x78a4): undefined reference to `tensorrt_llm::executor::ExecutorConfig::setLogitsPostProcessorMap(std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > const&)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans9.ltrans.o: in function `pybind11::cpp_function::initialize<tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::IterationStats const&)#1}, std::string, tensorrt_llm::executor::IterationStats const&, pybind11::name, pybind11::is_method, pybind11::sibling>(tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::IterationStats const&)#1}&&, std::string (*)(tensorrt_llm::executor::IterationStats const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0]': 2220.0 <artificial>:(.text+0x857c): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::IterationStats const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x85e4): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::IterationStats const&)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans9.ltrans.o: in function `pybind11::cpp_function::initialize<tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStats const&)#2}, std::string, tensorrt_llm::executor::RequestStats const&, pybind11::name, pybind11::is_method, pybind11::sibling>(tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStats const&)#2}&&, std::string (*)(tensorrt_llm::executor::RequestStats const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0]': 2220.0 <artificial>:(.text+0x878c): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStats const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x87f4): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStats const&)' 2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans9.ltrans.o: in function `pybind11::cpp_function::initialize<tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStatsPerIteration const&)#3}, std::string, tensorrt_llm::executor::RequestStatsPerIteration const&, pybind11::name, pybind11::is_method, pybind11::sibling>(tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStatsPerIteration const&)#3}&&, std::string (*)(tensorrt_llm::executor::RequestStatsPerIteration const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0]': 2220.0 <artificial>:(.text+0x8f68): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStatsPerIteration const&)' 2220.0 /usr/bin/ld: <artificial>:(.text+0x8fd0): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStatsPerIteration const&)' 2220.0 collect2: error: ld returned 1 exit status 2220.0 gmake[3]: *** [tensorrt_llm/pybind/CMakeFiles/bindings.dir/build.make:250: tensorrt_llm/pybind/bindings.cpython-310-aarch64-linux-gnu.so] Error 1 2220.0 gmake[2]: *** [CMakeFiles/Makefile2:1313: tensorrt_llm/pybind/CMakeFiles/bindings.dir/all] Error 2 2220.0 gmake[1]: *** [CMakeFiles/Makefile2:1320: tensorrt_llm/pybind/CMakeFiles/bindings.dir/rule] Error 2 2220.0 gmake: *** [Makefile:374: bindings] Error 2 2220.0 Traceback (most recent call last): 2220.0 File "/app/tensorrt_llm/scripts/build_wheel.py", line 352, in <module> 2220.0 main(**vars(args)) 2220.0 File "/app/tensorrt_llm/scripts/build_wheel.py", line 166, in main 2220.0 build_run( 2220.0 File "/usr/lib/python3.10/subprocess.py", line 526, in run 2220.0 raise CalledProcessError(retcode, process.args, 2220.0 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 72 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings executorWorker ' returned non-zero exit status 2. ------ Dockerfile.trt_llm_backend:49 -------------------- 47 | COPY scripts scripts 48 | COPY tensorrt_llm tensorrt_llm 49 | >>> RUN cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root="${TRT_ROOT}" -i -c && cd .. 50 | 51 | FROM trt_llm_builder as trt_llm_backend_builder -------------------- ERROR: failed to solve: process "/bin/sh -c cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root=\"${TRT_ROOT}\" -i -c && cd .." did not complete successfully: exit code: 1
My goal is to use the recent weight streaming feature to deploy models that cannot fit in the GPU VRAM only (96GB).
Curious which cloud platform you rented these on
Finally managed to build TensorRT-LLM using the Building from Source Code on Linux, Option 2: Build TensorRT-LLM Step-By-Step.
System Info
Python Version: CPython 3.10.12 Operating System: Linux 6.5.0-1023-oracle-64k CPU Architecture: aarch64 Driver Version: 550.67 CUDA Version: 12.4 GPU Architecture NVIDIA GH200 480GB
Who can help?
@juney-nvidia @ncomly-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Steps to reproduce the behaviour:
Build the container via Docker. Current main branch. Docker file uses:
Commands to reproduce:
Expected behavior
Successful container build that allows user to build model engines.
actual behavior
RUN cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root="/usr/local/tensorrt" -i -c && cd ..
fails with the following error:additional notes
My goal is to use the recent weight streaming feature to deploy models that cannot fit in the GPU VRAM only (96GB).