Cannot build Docker container on Grace Hopper

System Info

Python Version: CPython 3.10.12 Operating System: Linux 6.5.0-1023-oracle-64k CPU Architecture: aarch64 Driver Version: 550.67 CUDA Version: 12.4 GPU Architecture NVIDIA GH200 480GB

Who can help?

@juney-nvidia @ncomly-nvidia

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Steps to reproduce the behaviour:

Build the container via Docker. Current main branch. Docker file uses:

ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver
ARG BASE_TAG=24.03-py3

Commands to reproduce:

cd tensorrtllm_backend
git lfs install
git submodule update --init --recursive

DOCKER_BUILDKIT=1 docker build -t triton_trt_llm --build-arg TORCH_INSTALL_TYPE="src_non_cxx11_abi" -f dockerfile/Dockerfile.trt_llm_backend .

Expected behavior

Successful container build that allows user to build model engines.

actual behavior

RUN cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root="/usr/local/tensorrt" -i -c && cd .. fails with the following error:

2206.4 [100%] Linking CXX shared module bindings.cpython-310-aarch64-linux-gnu.so
2207.7 lto-wrapper: warning: using serial compilation of 18 LTRANS jobs
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans11.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<unsigned long, std::string>::execute<pybind11::class_<tensorrt_llm::executor::Response>, pybind11::arg, pybind11::arg, 0>(pybind11::class_<tensorrt_llm::executor::Response>&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::value_and_holder&, unsigned long, std::string)#1}, void, pybind11::detail::value_and_holder&, unsigned long, std::string, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg>(pybind11::class_<tensorrt_llm::executor::Response>&&, void (*)(pybind11::detail::value_and_holder&, unsigned long, std::string), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)':
2220.0 <artificial>:(.text+0x4d70): undefined reference to `tensorrt_llm::executor::Response::Response(unsigned long, std::string)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x4dcc): undefined reference to `tensorrt_llm::executor::Response::Response(unsigned long, std::string)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans11.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<bool, std::string>::execute<pybind11::class_<tensorrt_llm::executor::OrchestratorConfig>, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::OrchestratorConfig>&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, bool, std::string)#1}, void, pybind11::detail::value_and_holder&, bool, std::string, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg_v, pybind11::arg_v>(pybind11::class_<tensorrt_llm::executor::OrchestratorConfig>&&, void (*)(pybind11::detail::value_and_holder&, bool, std::string), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)':
2220.0 <artificial>:(.text+0x4fe0): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::OrchestratorConfig(bool, std::string, std::shared_ptr<tensorrt_llm::mpi::MpiComm>)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5074): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::OrchestratorConfig(bool, std::string, std::shared_ptr<tensorrt_llm::mpi::MpiComm>)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans12.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&>::execute<pybind11::class_<tensorrt_llm::pybind::executor::Executor>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, 0>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::value_and_holder&, std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)#1}, void, pybind11::detail::value_and_holder&, std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&&, void (*)(pybind11::detail::value_and_holder&, std::string const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)':
2220.0 <artificial>:(.text+0x3e24): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x3fe4): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans12.ltrans.o: in function `pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&>::execute<pybind11::class_<tensorrt_llm::pybind::executor::Executor>, pybind11::arg, pybind11::arg, pybind11::arg, 0>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)#1}, void, pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg>(pybind11::class_<tensorrt_llm::pybind::executor::Executor>&&, void (*)(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&)':
2220.0 <artificial>:(.text+0x4594): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x462c): undefined reference to `tensorrt_llm::executor::Executor::Executor(std::filesystem::path const&, tensorrt_llm::executor::ModelType, tensorrt_llm::executor::ExecutorConfig const&)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans13.ltrans.o: in function `void pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode> >::call_impl<void, pybind11::detail::initimpl::constructor<int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode> >::execute<pybind11::class_<tensorrt_llm::executor::ExecutorConfig>, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::ExecutorConfig>&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode>)#1}&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul, pybind11::detail::void_type>(pybind11::detail::initimpl::constructor<int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode> >::execute<pybind11::class_<tensorrt_llm::executor::ExecutorConfig>, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::ExecutorConfig>&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, tensorrt_llm::executor::PeftCacheConfig const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode>)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul>, pybind11::detail::void_type&&) && [clone .constprop.0]':
2220.0 <artificial>:(.text+0x69d8): undefined reference to `tensorrt_llm::executor::ExecutorConfig::ExecutorConfig(int, tensorrt_llm::executor::SchedulerConfig const&, tensorrt_llm::executor::KvCacheConfig const&, bool, bool, int, int, tensorrt_llm::executor::BatchingType, std::optional<tensorrt_llm::executor::ParallelConfig>, std::optional<tensorrt_llm::executor::PeftCacheConfig> const&, std::optional<std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > >, std::optional<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::DecodingMode>, float)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans14.ltrans.o: in function `void pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long> >::call_impl<void, pybind11::detail::initimpl::constructor<std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long> >::execute<pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long>)#1}&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, pybind11::detail::void_type>(pybind11::detail::initimpl::constructor<std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long> >::execute<pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::pybind::batch_manager::GptManager>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<tensorrt_llm::pybind::batch_manager::InferenceRequest, std::allocator<tensorrt_llm::pybind::batch_manager::InferenceRequest> > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::pybind::batch_manager::NamedTensor, std::allocator<tensorrt_llm::pybind::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long>)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul>, pybind11::detail::void_type&&) && [clone .constprop.0]':
2220.0 <artificial>:(.text+0x648): undefined reference to `tensorrt_llm::batch_manager::GptManager::GptManager(std::filesystem::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::executor::SchedulerConfig const&, std::function<std::list<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest> > > (int)>, std::function<void (unsigned long, std::list<tensorrt_llm::batch_manager::NamedTensor, std::allocator<tensorrt_llm::batch_manager::NamedTensor> > const&, bool, std::string const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::string const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long>, std::optional<int>, bool)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans14.ltrans.o: in function `void pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string> >::call_impl<void, pybind11::detail::initimpl::constructor<std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string> >::execute<pybind11::class_<tensorrt_llm::executor::Request>, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::Request>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string>)#1}&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul, 14ul, pybind11::detail::void_type>(pybind11::detail::initimpl::constructor<std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string> >::execute<pybind11::class_<tensorrt_llm::executor::Request>, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, pybind11::arg_v, 0>(pybind11::class_<tensorrt_llm::executor::Request>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string>)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul, 8ul, 9ul, 10ul, 11ul, 12ul, 13ul, 14ul>, pybind11::detail::void_type&&) && [clone .constprop.0]':
2220.0 <artificial>:(.text+0x1adc): undefined reference to `tensorrt_llm::executor::Request::Request(std::vector<int, std::allocator<int> >, int, bool, tensorrt_llm::executor::SamplingConfig const&, tensorrt_llm::executor::OutputConfig const&, std::optional<int> const&, std::optional<int> const&, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >, std::optional<tensorrt_llm::executor::Tensor>, std::optional<tensorrt_llm::executor::SpeculativeDecodingConfig>, std::optional<tensorrt_llm::executor::PromptTuningConfig>, std::optional<tensorrt_llm::executor::LoraConfig>, std::optional<std::string>)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans8.ltrans.o: in function `tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)':
2220.0 <artificial>:(.text+0x532c): undefined reference to `tensorrt_llm::executor::Request::getBadWords() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5330): undefined reference to `tensorrt_llm::executor::Request::setBadWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x533c): undefined reference to `tensorrt_llm::executor::Request::getBadWords() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5344): undefined reference to `tensorrt_llm::executor::Request::setBadWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5358): undefined reference to `tensorrt_llm::executor::Request::getStopWords() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x535c): undefined reference to `tensorrt_llm::executor::Request::setStopWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5368): undefined reference to `tensorrt_llm::executor::Request::getStopWords() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5370): undefined reference to `tensorrt_llm::executor::Request::setStopWords(std::list<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5414): undefined reference to `tensorrt_llm::executor::Request::setLogitsPostProcessorName(std::string const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5418): undefined reference to `tensorrt_llm::executor::Request::getLogitsPostProcessorName() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5424): undefined reference to `tensorrt_llm::executor::Request::getLogitsPostProcessorName() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x542c): undefined reference to `tensorrt_llm::executor::Request::setLogitsPostProcessorName(std::string const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5a60): undefined reference to `tensorrt_llm::executor::Response::getErrorMsg() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x5a6c): undefined reference to `tensorrt_llm::executor::Response::getErrorMsg() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x671c): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::setWorkerExecutablePath(std::string const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x6720): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::getWorkerExecutablePath() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x672c): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::getWorkerExecutablePath() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x6734): undefined reference to `tensorrt_llm::executor::OrchestratorConfig::setWorkerExecutablePath(std::string const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x788c): undefined reference to `tensorrt_llm::executor::ExecutorConfig::setLogitsPostProcessorMap(std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x7890): undefined reference to `tensorrt_llm::executor::ExecutorConfig::getLogitsPostProcessorMap() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x789c): undefined reference to `tensorrt_llm::executor::ExecutorConfig::getLogitsPostProcessorMap() const'
2220.0 /usr/bin/ld: <artificial>:(.text+0x78a4): undefined reference to `tensorrt_llm::executor::ExecutorConfig::setLogitsPostProcessorMap(std::unordered_map<std::string, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::function<void (unsigned long, tensorrt_llm::executor::Tensor&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::shared_ptr<tensorrt_llm::runtime::CudaStream>&)> > > > const&)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans9.ltrans.o: in function `pybind11::cpp_function::initialize<tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::IterationStats const&)#1}, std::string, tensorrt_llm::executor::IterationStats const&, pybind11::name, pybind11::is_method, pybind11::sibling>(tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::IterationStats const&)#1}&&, std::string (*)(tensorrt_llm::executor::IterationStats const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0]':
2220.0 <artificial>:(.text+0x857c): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::IterationStats const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x85e4): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::IterationStats const&)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans9.ltrans.o: in function `pybind11::cpp_function::initialize<tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStats const&)#2}, std::string, tensorrt_llm::executor::RequestStats const&, pybind11::name, pybind11::is_method, pybind11::sibling>(tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStats const&)#2}&&, std::string (*)(tensorrt_llm::executor::RequestStats const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0]':
2220.0 <artificial>:(.text+0x878c): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStats const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x87f4): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStats const&)'
2220.0 /usr/bin/ld: /tmp/cc908EdA.ltrans9.ltrans.o: in function `pybind11::cpp_function::initialize<tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStatsPerIteration const&)#3}, std::string, tensorrt_llm::executor::RequestStatsPerIteration const&, pybind11::name, pybind11::is_method, pybind11::sibling>(tensorrt_llm::pybind::executor::InitBindings(pybind11::module_&)::{lambda(tensorrt_llm::executor::RequestStatsPerIteration const&)#3}&&, std::string (*)(tensorrt_llm::executor::RequestStatsPerIteration const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0]':
2220.0 <artificial>:(.text+0x8f68): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStatsPerIteration const&)'
2220.0 /usr/bin/ld: <artificial>:(.text+0x8fd0): undefined reference to `tensorrt_llm::executor::JsonSerialization::toJsonStr(tensorrt_llm::executor::RequestStatsPerIteration const&)'
2220.0 collect2: error: ld returned 1 exit status
2220.0 gmake[3]: *** [tensorrt_llm/pybind/CMakeFiles/bindings.dir/build.make:250: tensorrt_llm/pybind/bindings.cpython-310-aarch64-linux-gnu.so] Error 1
2220.0 gmake[2]: *** [CMakeFiles/Makefile2:1313: tensorrt_llm/pybind/CMakeFiles/bindings.dir/all] Error 2
2220.0 gmake[1]: *** [CMakeFiles/Makefile2:1320: tensorrt_llm/pybind/CMakeFiles/bindings.dir/rule] Error 2
2220.0 gmake: *** [Makefile:374: bindings] Error 2
2220.0 Traceback (most recent call last):
2220.0   File "/app/tensorrt_llm/scripts/build_wheel.py", line 352, in <module>
2220.0     main(**vars(args))
2220.0   File "/app/tensorrt_llm/scripts/build_wheel.py", line 166, in main
2220.0     build_run(
2220.0   File "/usr/lib/python3.10/subprocess.py", line 526, in run
2220.0     raise CalledProcessError(retcode, process.args,
2220.0 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 72 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings  executorWorker ' returned non-zero exit status 2.
------
Dockerfile.trt_llm_backend:49
--------------------
  47 |     COPY scripts scripts
  48 |     COPY tensorrt_llm tensorrt_llm
  49 | >>> RUN cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root="${TRT_ROOT}" -i -c && cd ..
  50 |
  51 |     FROM trt_llm_builder as trt_llm_backend_builder
--------------------
ERROR: failed to solve: process "/bin/sh -c cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root=\"${TRT_ROOT}\" -i -c && cd .." did not complete successfully: exit code: 1

additional notes

My goal is to use the recent weight streaming feature to deploy models that cannot fit in the GPU VRAM only (96GB).

triton-inference-server / tensorrtllm_backend