triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
657 stars 95 forks source link

Can't build docker image #357

Open mallorbc opened 6 months ago

mallorbc commented 6 months ago

System Info

Ryzen 5950x, Ubuntu 22.04, 2 RTX 3090s, main branch

Who can help?

@byshiue @sch

Information

Tasks

Reproduction

run this bash script:

#!/bin/sh
# Update the submodules
cd tensorrtllm_backend
git lfs install
git submodule update --init --recursive

# Use the Dockerfile to build the backend in a container
# For x86_64
DOCKER_BUILDKIT=1 docker build --no-cache -t triton_trt_llm_main_test -f dockerfile/Dockerfile.trt_llm_backend .

Expected behavior

I expect it to successfully build.

actual behavior

Errors out with this: 150.4 [ 43%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/pagedKVCubin/fmha_v2_flash_attention_fp16_64_128_S_104_pagedKV_sm89.cubin.cpp.o 150.5 /app/tensorrt_llm/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/pagedKVCubin/fmha_v2_flash_attention_fp16_128_128_S_64_pagedKV_sm90.cubin.cpp:14922:1: sorry, unimplemented: non-trivial designated initializers not supported 150.5 14922 | }; 150.5 | ^ 150.5 [ 43%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/pagedKVCubin/fmha_v2_flash_attention_fp16_64_128_S_104_pagedKV_sm90.cubin.cpp.o 150.5 gmake[3]: [tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/build.make:5648: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/pagedKVCubin/fmha_v2_flash_attention_fp16_128_128_S_64_pagedKV_sm90.cubin.cpp.o] Error 1 150.5 gmake[3]: Waiting for unfinished jobs.... 150.7 gmake[2]: [CMakeFiles/Makefile2:838: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/all] Error 2 150.7 gmake[2]: Waiting for unfinished jobs.... 151.8 [ 43%] Built target common_src 157.2 [ 43%] Built target runtime_src 174.3 nvcc error : 'cicc' died due to signal 11 (Invalid memory reference) 174.3 nvcc error : 'cicc' core dumped 174.3 gmake[3]: [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/cutlass_src.dir/build.make:398: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/cutlass_src.dir/moe_gemm/moe_gemm_kernels_bf16_uint8.cu.o] Error 139 176.2 In file included from tmpxft_000005d9_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.stub.c:1: 176.2 /tmp/tmpxft_000005d9_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.stub.c:39: internal compiler error: Segmentation fault 176.2 39 | static void device_stub__ZN7cutlass6KernelINS_4gemm6kernel11GemmFpAIntBINS1_11threadblock15DqMmaMultistageINS1_9GemmShapeILi32ELi128ELi64EEENS_9transform11threadblock28PredicatedTileAccessIteratorINS_11MatrixShapeILi32ELi64EEENS_10bfloat16_tENS_6layout8RowMajorELi1ENS8_29PitchLinearWarpRakedThreadMapINS_16PitchLinearShapeILi64ELi32EEELi128ENSH_ILi8ELi4EEELi8EEENS_5ArrayISD_Li8ELb0EEELb0ENSE_9NoPermuteEEENS9_25RegularTileAccessIteratorISC_SD_NSE_37RowMajorTensorOpMultiplicandCrosswiseILi16ELi64EEELi0ESK_Li16EEELNS_4arch14CacheOperation4KindE1ENSA_INSB_ILi256ELi32EEENS_15integer_subbyteILi4ELb0EEENSE_11ColumnMajorELi0ENSG_INSH_ILi256ELi32EEELi128ESJ_Li32EEENSL_ISY_Li32ELb0EEELb0ESN_EENSP_INSB_ILi64ELi128EEESY_NSE_40ColumnMajorTensorOpMultiplicandCrosswiseILi4ELi64EEELi1ENSG_INSH_ILi64ELi128EEELi128ENSH_ILi2ELi16EEELi32EEELi16EEELSV_1ENS9_28FineGrainedScaleZeroIteratorINSB_ILi1ELi128EEESD_SF_Li0ELi8EEES1D_fSF_NS4_9MmaPolicyINS1_4warp26MmaTensorOpComputeBWithF16INS6_ILi32ELi32ELi64EEESD_SR_SY_S16_fSF_NS1F_17MmaTensorOpPolicyINST_3MmaINS6_ILi16ELi8ELi16EEELi32ESD_SF_SD_SZ_fSF_NST_13OpMultiplyAddEEENSB_ILi1ELi1EEEEENS6_ILi16ELi8ELi32EEELi1ELb0EbEENSB_ILi0ELi0EEES1R_Li1EEELi2ENS_45FastInterleavedAndBiasedNumericArrayConverterISD_SY_Li32EEELNS_17WeightOnlyQuantOpE3ELNS1_23SharedMemoryClearOptionE0EvEENS_8epilogue11threadblock8EpilogueIS7_S1Q_Li1ENS1Z_22PredicatedTileIteratorINS1Z_26OutputTileOptimalThreadMapINS1Z_15OutputTileShapeILi128ELi8ELi1ELi1ELi1EEENS23_ILi1ELi4ELi1ELi1ELi4EEELi128ELi8ELi16EEESD_Lb0ESN_Lb0EEENS1Y_4warp24FragmentIteratorTensorOpIS1H_S1K_fNSL_IfLi4ELb1EEESF_EENS28_20TileIteratorTensorOpIS1H_S1K_fSF_EENS1Z_18SharedLoadIteratorINS26_18CompactedThreadMapEfLi32EEENS1Y_6thread17LinearCombinationISD_Li8EffLNS2H_9ScaleType4KindE1ELNS_15FloatRoundStyleE2ESD_EENSB_ILi0ELi8EEELi1ELi1EEENS4_30GemmIdentityThreadblockSwizzleILi1EEENST_4Sm80ELb1EEEEEvNT_6ParamsE( _ZN7cutlass4gemm6kernel11GemmFpAIntBINS0_11threadblock15DqMmaMultistageINS0_9GemmShapeILi32ELi128ELi64EEENS_9transform11threadblock28PredicatedTileAccessIteratorINS_11MatrixShapeILi32ELi64EEENS_10bfloat16_tENS_6layout8RowMajorELi1ENS7_29PitchLinearWarpRakedThreadMapINS_16PitchLinearShapeILi64ELi32EEELi128ENSG_ILi8ELi4EEELi8EEENS_5ArrayISC_Li8ELb0EEELb0ENSD_9NoPermuteEEENS8_25RegularTileAccessIteratorISB_SC_NSD_37RowMajorTensorOpMultiplicandCrosswiseILi16ELi64EEELi0ESJ_Li16EEELNS_4arch14CacheOperation4KindE1ENS9_INSA_ILi256ELi32EEENS_15integer_subbyteILi4ELb0EEENSD_11ColumnMajorELi0ENSF_INSG_ILi256ELi32EEELi128ESI_Li32EEENSK_ISX_Li32ELb0EEELb0ESM_EENSO_INSA_ILi64ELi128EEESX_NSD_40ColumnMajorTensorOpMultiplicandCrosswiseILi4ELi64EEELi1ENSF_INSG_ILi64ELi128EEELi128ENSG_ILi2ELi16EEELi32EEELi16EEELSU_1ENS8_28FineGrainedScaleZeroIteratorINSA_ILi1ELi128EEESC_SE_Li0ELi8EEES1C_fSE_NS3_9MmaPolicyINS0_4warp26MmaTensorOpComputeBWithF16INS5_ILi32ELi32ELi64EEESC_SQ_SX_S15_fSE_NS1E_17MmaTensorOpPolicyINSS_3MmaINS5_ILi16ELi8ELi16EEELi32ESC_SE_SC_SY_fSE_NSS_13OpMultiplyAddEEENSA_ILi1ELi1EEEEENS5_ILi16ELi8ELi32EEELi1ELb0EbEENSA_ILi0ELi0EEES1Q_Li1EEELi2ENS_45FastInterleavedAndBiasedNumericArrayConverterISC_SX_Li32EEELNS_17WeightOnlyQuantOpE3ELNS0_23SharedMemoryClearOptionE0EvEENS_8epilogue11threadblock8EpilogueIS6_S1P_Li1ENS1Y_22PredicatedTileIteratorINS1Y_26OutputTileOptimalThreadMapINS1Y_15OutputTileShapeILi128ELi8ELi1ELi1ELi1EEENS22_ILi1ELi4ELi1ELi1ELi4EEELi128ELi8ELi16EEESC_Lb0ESM_Lb0EEENS1X_4warp24FragmentIteratorTensorOpIS1G_S1J_fNSK_IfLi4ELb1EEESE_EENS27_20TileIteratorTensorOpIS1G_S1J_fSE_EENS1Y_18SharedLoadIteratorINS25_18CompactedThreadMapEfLi32EEENS1X_6thread17LinearCombinationISC_Li8EffLNS2G_9ScaleType4KindE1ELNS_15FloatRoundStyleE2ESC_EENSA_ILi0ELi8EEELi1ELi1EEENS3_30GemmIdentityThreadblockSwizzleILi1EEENSS_4Sm80ELb1EE6ParamsE&par0){cudaLaunchPrologue(1);cudaSetupArg(par0, 0UL);cudaLaunch(((char )((void ( )( _ZN7cutlass4gemm6kernel11GemmFpAIntBINS0_11threadblock15DqMmaMultistageINS0_9GemmShapeILi32ELi128ELi64EEENS_9transform11threadblock28PredicatedTileAccessIteratorINS_11MatrixShapeILi32ELi64EEENS_10bfloat16_tENS_6layout8RowMajorELi1ENS7_29PitchLinearWarpRakedThreadMapINS_16PitchLinearShapeILi64ELi32EEELi128ENSG_ILi8ELi4EEELi8EEENS_5ArrayISC_Li8ELb0EEELb0ENSD_9NoPermuteEEENS8_25RegularTileAccessIteratorISB_SC_NSD_37RowMajorTensorOpMultiplicandCrosswiseILi16ELi64EEELi0ESJ_Li16EEELNS_4arch14CacheOperation4KindE1ENS9_INSA_ILi256ELi32EEENS_15integer_subbyteILi4ELb0EEENSD_11ColumnMajorELi0ENSF_INSG_ILi256ELi32EEELi128ESI_Li32EEENSK_ISX_Li32ELb0EEELb0ESM_EENSO_INSA_ILi64ELi128EEESX_NSD_40ColumnMajorTensorOpMultiplicandCrosswiseILi4ELi64EEELi1ENSF_INSG_ILi64ELi128EEELi128ENSG_ILi2ELi16EEELi32EEELi16EEELSU_1ENS8_28FineGrainedScaleZeroIteratorINSA_ILi1ELi128EEESC_SE_Li0ELi8EEES1C_fSE_NS3_9MmaPolicyINS0_4warp26MmaTensorOpComputeBWithF16INS5_ILi32ELi32ELi64EEESC_SQ_SX_S15_fSE_NS1E_17MmaTensorOpPolicyINSS_3MmaINS5_ILi16ELi8ELi16EEELi32ESC_SE_SC_SY_fSE_NSS_13OpMultiplyAddEEENSA_ILi1ELi1EEEEENS5_ILi16ELi8ELi32EEELi1ELb0EbEENSA_ILi0ELi0EEES1Q_Li1EEELi2ENS_45FastInterleavedAndBiasedNumericArrayConverterISC_SX_Li32EEELNS_17WeightOnlyQuantOpE3ELNS0_23SharedMemoryClearOptionE0EvEENS_8epilogue11threadblock8EpilogueIS6_S1P_Li1ENS1Y_22PredicatedTileIteratorINS1Y_26OutputTileOptimalThreadMapINS1Y_15OutputTileShapeILi128ELi8ELi1ELi1ELi1EEENS22_ILi1ELi4ELi1ELi1ELi4EEELi128ELi8ELi16EEESC_Lb0ESM_Lb0EEENS1X_4warp24FragmentIteratorTensorOpIS1G_S1J_fNSK_IfLi4ELb1EEESE_EENS27_20TileIteratorTensorOpIS1G_S1J_fSE_EENS1Y_18SharedLoadIteratorINS25_18CompactedThreadMapEfLi32EEENS1X_6thread17LinearCombinationISC_Li8EffLNS2G_9ScaleType4KindE1ELNS_15FloatRoundStyleE2ESC_EENSA_ILi0ELi8EEELi1ELi1EEENS3_30GemmIdentityThreadblockSwizzleILi1EEENSS_4Sm80ELb1EE6ParamsE))cutlass::Kernel< ::cutlass::gemm::kernel::GemmFpAIntB< ::cutlass::gemm::threadblock::DqMmaMultistage< ::cutlass::gemm::GemmShape<(int)32, (int)128, (int)64> , ::cutlass::transform::threadblock::PredicatedTileAccessIterator< ::cutlass::MatrixShape<(int)32, (int)64> , ::cutlass::bfloat16_t, ::cutlass::layout::RowMajor, (int)1, ::cutlass::transform::PitchLinearWarpRakedThreadMap< ::cutlass::PitchLinearShape<(int)64, (int)32> , (int)128, ::cutlass::PitchLinearShape<(int)8, (int)4> , (int)8> , ::cutlass::Array< ::cutlass::bfloat16_t, (int)8, (bool)0> , (bool)0, ::cutlass::layout::NoPermute> , ::cutlass::transform::threadblock::RegularTileAccessIterator< ::cutlass::MatrixShape<(int)32, (int)64> , ::cutlass::bfloat16_t, ::cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<(int)16, (int)64> , (int)0, ::cutlass::transform::PitchLinearWarpRakedThreadMap< ::cutlass::PitchLinearShape<(int)64, (int)32> , (int)128, ::cutlass::PitchLinearShape<(int)8, (int)4> , (int)8> , (int)16> , ( ::cutlass::arch::CacheOperation::Kind)1, ::cutlass::transform::threadblock::PredicatedTileAccessIterator< ::cutlass::MatrixShape<(int)256, (int)32> , ::cutlass::integer_subbyte<(int)4, (bool)0> , ::cutlass::layout::ColumnMajor, (int)0, ::cutlass::transform::PitchLinearWarpRakedThreadMap< ::cutlass::PitchLinearShape<(int)256, (int)32> , (int)128, ::cutlass::PitchLinearShape<(int)8, (int)4> , (int)32> , ::cutlass::Array< ::cutlass::integer_subbyte<(int)4, (bool)0> , (int)32, (bool)0> , (bool)0, ::cutlass::layout::NoPermute> , ::cutlass::transform::threadblock::RegularTileAccessIterator< ::cutlass::MatrixShape<(int)64, (int)128> , ::cutlass::integer_subbyte<(int)4, (bool)0> , ::cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<(int)4, (int)64> , (int)1, ::cutlass::transform::PitchLinearWarpRakedThreadMap< ::cutlass::PitchLinearShape<(int)64, (int)128> , (int)128, ::cutlass::PitchLinearShape<(int)2, (int)16> , (int)32> , (int)16> , ( ::cutlass::arch::CacheOperation::Kind)1, ::cutlass::transform::threadblock::FineGrainedScaleZeroIterator< ::cutlass::MatrixShape<(int)1, (int)128> , ::cutlass::bfloat16_t, ::cutlass::layout::RowMajor, (int)0, (int)8> , ::cutlass::transform::threadblock::FineGrainedScaleZeroIterator< ::cutlass::MatrixShape<(int)1, (int)128> , ::cutlass::bfloat16_t, ::cutlass::layout::RowMajor, (int)0, (int)8> , float, ::cutlass::layout::RowMajor, ::cutlass::gemm::threadblock::MmaPolicy< ::cutlass::gemm::warp::MmaTensorOpComputeBWithF16< ::cutlass::gemm::GemmShape<(int)32, (int)32, (int)64> , ::cutlass::bfloat16_t, ::cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<(int)16, (int)64> , ::cutlass::integer_subbyte<(int)4, (bool)0> , ::cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<(int)4, (int)64> , float, ::cutlass::layout::RowMajor, ::cutlass::gemm::warp::MmaTensorOpPolicy< ::cutlass::arch::Mma< ::cutlass::gemm::GemmShape<(int)16, (int)8, (int)16> , (int)32, ::cutlass::bfloat16_t, ::cutlass::layout::RowMajor, ::cutlass::bfloat16_t, ::cutlass::layout::ColumnMajor, float, ::cutlass::layout::RowMajor, ::cutlass::arch::OpMultiplyAdd> , ::cutlass::MatrixShape<(int)1, (int)1> > , ::cutlass::gemm::GemmShape<(int)16, (int)8, (int)32> , (int)1, (bool)0, bool> , ::cutlass::MatrixShape<(int)0, (int)0> , ::cutlass::MatrixShape<(int)0, (int)0> , (int)1> , (int)2, ::cutlass::FastInterleavedAndBiasedNumericArrayConverter< ::cutlass::bfloat16_t, ::cutlass::integer_subbyte<(int)4, (bool)0> , (int)32> , ( ::cutlass::WeightOnlyQuantOp)3, ( ::cutlass::gemm::SharedMemoryClearOption)0, void> , ::cutlass::epilogue::threadblock::Epilogue< ::cutlass::gemm::GemmShape<(int)32, (int)128, (int)64> , ::cutlass::gemm::warp::MmaTensorOpComputeBWithF16< ::cutlass::gemm::GemmShape<(int)32, (int)32, (int)64> , ::cutlass::bfloat16_t, ::cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<(int)16, (int)64> , ::cutlass::integer_subbyte<(int)4, (bool)0> , ::cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<(int)4, (int)64> , float, ::cutlass::layout::RowMajor, ::cutlass::gemm::warp::MmaTensorOpPolicy< ::cutlass::arch::Mma< ::cutlass::gemm::GemmShape<(int)16, (int)8, (int)16> , (int)32, ::cutlass::bfloat16_t, ::cutlass::layout::RowMajor, ::cutlass::bfloat16_t, ::cutlass::layout::ColumnMajor, float, ::cutlass::layout::RowMajor, ::cutlass::arch::OpMultiplyAdd> , ::cutlass::MatrixShape<(int)1, (int)1> > , ::cutlass::gemm::GemmShape<(int)16, (int)8, (int)32> , (int)1, (bool)0, bool> , (int)1, ::cutlass::epilogue::threadblock::PredicatedTileIterator< ::cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< ::cutlass::epilogue::threadblock::OutputTileShape<(int)128, (int)8, (int)1, (int)1, (int)1> , ::cutlass::epilogue::threadblock::OutputTileShape<(int)1, (int)4, (int)1, (int)1, (int)4> , (int)128, (int)8, (int)16> , ::cutlass::bfloat16_t, (bool)0, ::cutlass::layout::NoPermute, (bool)0> , ::cutlass::epilogue::warp::FragmentIteratorTensorOp< ::cutlass::gemm::GemmShape<(int)32, (int)32, (int)64> , ::cutlass::gemm::GemmShape<(int)16, (int)8, (int)16> , float, ::cutlass::Array<float, (int)4, (bool)1> , ::cutlass::layout::RowMajor> , ::cutlass::epilogue::warp::TileIteratorTensorOp< ::cutlass::gemm::GemmShape<(int)32, (int)32, (int)64> , ::cutlass::gemm::GemmShape<(int)16, (int)8, (int)16> , float, ::cutlass::layout::RowMajor> , ::cutlass::epilogue::threadblock::SharedLoadIterator< ::cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< ::cutlass::epilogue::threadblock::OutputTileShape<(int)128, (int)8, (int)1, (int)1, (int)1> , ::cutlass::epilogue::threadblock::OutputTileShape<(int)1, (int)4, (int)1, (int)1, (int)4> , (int)128, (int)8, (int)16> ::CompactedThreadMap, float, (int)32> , ::cutlass::epilogue::thread::LinearCombination< ::cutlass::bfloat16_t, (int)8, float, float, ( ::cutlass::epilogue::thread::ScaleType::Kind)1, ( ::cutlass::FloatRoundStyle)2, ::cutlass::bfloat16_t> , ::cutlass::MatrixShape<(int)0, (int)8> , (int)1, (int)1> , ::cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<(int)1> , ::cutlass::arch::Sm80, (bool)1> > )));}namespace cutlass{ 176.2 | 176.2 0xe34ddb internal_error(char const, ...) 176.2 ???:0 176.2 0x13ac413 ggc_set_mark(void const) 176.2 ???:0 176.2 0x13a969d gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13a9f70 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13a9fda gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13aa2d9 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13b438d gt_ggc_mx_vec_tree_vagc(void) 176.2 ???:0 176.2 0x13b4872 gt_ggc_mx_lang_type(void) 176.2 ???:0 176.2 0x13aac54 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13a9ff9 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13a9f51 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13a9f36 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13aa039 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13a9fda gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13aa2d9 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13b438d gt_ggc_mx_vec_tree_vagc(void) 176.2 ???:0 176.2 0x13b4872 gt_ggc_mx_lang_type(void) 176.2 ???:0 176.2 0x13aac54 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13aa4bd gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 0x13aa014 gt_ggc_mx_lang_tree_node(void) 176.2 ???:0 176.2 Please submit a full bug report, 176.2 with preprocessed source if appropriate. 176.2 Please include the complete backtrace with any bug report. 176.2 See file:///usr/share/doc/gcc-11/README.Bugs for instructions. 176.3 gmake[3]: [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/cutlass_src.dir/build.make:104: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/cutlass_src.dir/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu.o] Error 1 377.3 gmake[2]: [CMakeFiles/Makefile2:864: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/cutlass_src.dir/all] Error 2 377.3 gmake[1]: [CMakeFiles/Makefile2:793: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2 377.3 gmake: * [Makefile:192: tensorrt_llm] Error 2 377.3 Traceback (most recent call last): 377.3 File "/app/tensorrt_llm/scripts/build_wheel.py", line 332, in 377.3 main(vars(args)) 377.3 File "/app/tensorrt_llm/scripts/build_wheel.py", line 166, in main 377.3 build_run( 377.3 File "/usr/lib/python3.10/subprocess.py", line 526, in run 377.3 raise CalledProcessError(retcode, process.args, 377.3 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 32 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings ' returned non-zero exit status 2.

Dockerfile.trt_llm_backend:47

45 | COPY scripts scripts 46 | COPY tensorrt_llm tensorrt_llm 47 | >>> RUN cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root="${TRT_ROOT}" -i -c && cd .. 48 | 49 | FROM trt_llm_builder as trt_llm_backend_builder

ERROR: failed to solve: process "/bin/sh -c cd tensorrt_llm && python3 scripts/build_wheel.py --trt_root=\"${TRT_ROOT}\" -i -c && cd .." did not complete successfully: exit code: 1

additional notes

For some odd reason, it seems to work on an A100 VM

byshiue commented 6 months ago

Could you try running git submodule update --init --recursive in tensorrt_llm folder? Also, if you could build tensorrt_llm on one machine, you should be able to install it on another machine directly.

mallorbc commented 6 months ago

I will try that command and then try rebuilding. I will then share what I find.

I agree, it is very odd that I can build the docker image on one machine but not another.