Open arya-samsung opened 4 weeks ago
这大概是因为你的Batch Manager的静态文件不完整导致的
这大概是因为你的Batch Manager的静态文件不完整导致的
https://github.com/NVIDIA/TensorRT-LLM/tree/main/cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu - this right?
thanks for the lead, will check on this :)
After fixing the batch manager files issue, got this error:
Installed /tmp/tritonbuild/tensorrtllm/tensorrt_llm/3rdparty/cutlass/python Processing dependencies for cutlass-library==3.4.1 Finished processing dependencies for cutlass-library==3.4.1 -- MANUALLY APPENDING FLAG TO COMPILE FOR SM_90a. -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- Operating System: ubuntu, 22.04 -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Found pybind11: /usr/local/lib/python3.10/dist-packages/pybind11/include (found version "2.13.1") CMake Error at tensorrt_llm/plugins/CMakeLists.txt:108 (set_target_properties): set_target_properties called with incorrect number of arguments.
-- Found Python: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter
-- ========================= Importing and creating target nvonnxparser ==========================
-- Looking for library nvonnxparser
-- Library that was found /usr/lib/x86_64-linux-gnu/libnvonnxparser.so
-- ==========================================================================================
-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
File "/tmp/tritonbuild/tensorrtllm/build/../tensorrt_llm/scripts/build_wheel.py", line 332, in
Were you able to figure it out @arya-samsung? Facing a similar issue with branch r24.06
.
nope :( still facing it, will update here if a solution's been found.. do let me know too in case you find a workaround/solution..
Hi,
I encountered the same issue when I followed the Option 1 of Build the Docker Container.
I did manage to fix the issue:
1) Open ./build.py
2) Modify this line by replacing triton_tensorrtllm_worker
with trtllmExecutorWorker
Previous version:
cmake_script.cp(
os.path.join(tensorrtllm_be_dir, "build", "triton_tensorrtllm_worker"),
cmake_destination_dir,
)
Fixed version:
cmake_script.cp(
os.path.join(tensorrtllm_be_dir, "build", "trtllmExecutorWorker"),
cmake_destination_dir,
)
3) Run the script and it should work
Credits to #7194 for the fix. I don't know why this commit is not in the r24.05
(and r24.04
, it seems) branch, as the last commit on that branch dates back to May 29th, whereas this pull request was merged on May 8th.
Cheers
After making change from #7194 and trying to build again, got the following error:
/usr/bin/ld: libtriton_tensorrtllm_common.so: undefined reference to tensorrt_llm::batch_manager::GptManager::GptManager(std::filesystem::__cxx11::path const&, tensorrt_llm::batch_manager::TrtGptModelType, int, tensorrt_llm::batch_manager::batch_scheduler::SchedulerPolicy, std::function<std::__cxx11::list<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest> > > (int)>, std::function<void (unsigned long, std::__cxx11::list<tensorrt_llm::batch_manager::NamedTensor, std::allocator<tensorrt_llm::batch_manager::NamedTensor> > const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>, std::function<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > ()>, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>, tensorrt_llm::batch_manager::TrtGptModelOptionalParams const&, std::optional<unsigned long>, std::optional<int>, bool)' /usr/bin/ld: libtriton_tensorrtllm_common.so: undefined reference to
tensorrt_llm::batch_manager::NamedTensor::NamedTensor(nvinfer1::DataType, std::vector<long, std::allocator
Description While building from source, the build fails when tensorrt_llm backend is chosen.
Triton Information What version of Triton are you using? r24.04
Are you using the Triton container or did you build it yourself? Building from source
To Reproduce Steps to reproduce the behavior. checkout r24.04 branch of server run: ./build.py -v --backend=python --enable-logging --endpoint=http --enable-tracing --enable-stats --enable-gpu --backend=tensorrtllm
this gives the error CMake Error at tensorrt_llm/CMakeLists.txt:107 (message): The batch manager library is truncated or incomplete. This is usually caused by using Git LFS (Large File Storage) incorrectly. Please try running command
git lfs install && git lfs pull
.so we tried adding: self.cmd(f"cd {subdir} && git submodule init && git submodule update --merge && git lfs install && git lfs pull && cd ..", check_exitcode=True,)
after the git clone step here: https://github.com/triton-inference-server/server/blob/bf430f8589c82c57cc28e64be456c63a65ce7664/build.py#L325
but this did not help
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). NA
Expected behavior The built should have completed successfully, with no errors, and the docker image should have been ready
Additional Details: Build was attempted using the steps given here: https://github.com/triton-inference-server/tensorrtllm_backend/tree/main#option-1-build-via-the-buildpy-script-in-server-repo
But this failed with the following error:
cp: cannot stat '/tmp/tritonbuild/tensorrtllm/build/triton_tensorrtllm_worker': No such file or directory error: build failed