+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB Off | 00000000:00:05.0 Off | 0 |
| N/A 31C P0 59W / 400W | 174MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
How to reproduce it
After I ran all the given commands to build the docker image, I went inside the container and ran the llama.py inside examples/ folder. And here is the error logs:
[TensorRT-LLM][WARNING] Parameter max_prompt_embedding_table_size cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_prompt_embedding_table_size' not found
[TensorRT-LLM][WARNING] Parameter gather_all_token_logits cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'gather_all_token_logits' not found
[TensorRT-LLM][WARNING] Parameter gather_all_token_logits cannot be read from json:
[TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'gather_all_token_logits' not found
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][INFO] Loaded engine size: 12856 MiB
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 13276, GPU 14063 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 13276, GPU 14071 (MiB)
[TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.4
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12852, now: CPU 0, GPU 12852 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 13276, GPU 14127 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 13276, GPU 14135 (MiB)
[TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.4
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 12852 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 13308, GPU 14151 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 13308, GPU 14161 (MiB)
[TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.4
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 12852 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 13341, GPU 14179 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 13341, GPU 14189 (MiB)
[TensorRT-LLM][WARNING] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.4
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 12852 (MiB)
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] CUDA runtime error in cudaOccupancyMaxActiveBlocksPerMultiprocessor(&num_blocks_per_sm, mmha::masked_multihead_attention_kernel<T, T_cache, KVCacheBuffer, Dh, THDS_PER_BLOCK, KernelParamsType::DO_CROSS_ATTENTION, HAS_BEAMS, DO_MULTI_BLOCK>, THDS_PER_BLOCK, 0): no kernel image is available for execution on the device (/src/tensorrt_llm/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionLaunch.h:206)
1 0x7fc8b702a564 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x55564) [0x7fc8b702a564]
2 0x7fc8b7401622 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x42c622) [0x7fc8b7401622]
3 0x7fc8b7086545 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xb1545) [0x7fc8b7086545]
4 0x7fc8b70971f9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xc21f9) [0x7fc8b70971f9]
5 0x7fc8b70a19dd /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xcc9dd) [0x7fc8b70a19dd]
6 0x7fc8b70a3c6a /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xcec6a) [0x7fc8b70a3c6a]
7 0x7fc8b709c7ad tensorrt_llm::plugins::GPTAttentionPlugin::enqueue(nvinfer1::PluginTensorDesc const*, nvinfer1::PluginTensorDesc const*, void const* const*, void* const*, void*, CUstream_st*) + 189
8 0x7fc9116b6ba9 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10cdba9) [0x7fc9116b6ba9]
9 0x7fc91168c6af /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a36af) [0x7fc91168c6af]
10 0x7fc91168e320 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a5320) [0x7fc91168e320]
11 0x7fc8da710d2f tensorrt_llm::runtime::GptSession::executeGenerationStep(int, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager*, std::vector<bool, std::allocator<bool> >&) + 1903
12 0x7fc8da71261e tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3134
13 0x7fc8da7137e1 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 3105
14 0x7fc8da6aa949 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xcd949) [0x7fc8da6aa949]
15 0x7fc8da691bc7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb4bc7) [0x7fc8da691bc7]
16 0x5566a423be0e python3(+0x15fe0e) [0x5566a423be0e]
17 0x5566a42325eb _PyObject_MakeTpCall + 603
18 0x5566a424a7bb python3(+0x16e7bb) [0x5566a424a7bb]
19 0x5566a422a8a2 _PyEval_EvalFrameDefault + 24914
20 0x5566a424a4e1 python3(+0x16e4e1) [0x5566a424a4e1]
21 0x5566a424b192 PyObject_Call + 290
22 0x5566a42272c1 _PyEval_EvalFrameDefault + 11121
23 0x5566a4315e56 python3(+0x239e56) [0x5566a4315e56]
24 0x5566a4315cf6 PyEval_EvalCode + 134
25 0x5566a43407d8 python3(+0x2647d8) [0x5566a43407d8]
26 0x5566a433a0bb python3(+0x25e0bb) [0x5566a433a0bb]
27 0x5566a4340525 python3(+0x264525) [0x5566a4340525]
28 0x5566a433fa08 _PyRun_SimpleFileObject + 424
29 0x5566a433f653 _PyRun_AnyFileObject + 67
30 0x5566a433241e Py_RunMain + 702
31 0x5566a4308cad Py_BytesMain + 45
32 0x7fca0e41bd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fca0e41bd90]
33 0x7fca0e41be40 __libc_start_main + 128
34 0x5566a4308ba5 _start + 37
[e5d1ed2681d9:00503] *** Process received signal ***
[e5d1ed2681d9:00503] Signal: Aborted (6)
[e5d1ed2681d9:00503] Signal code: (-6)
[e5d1ed2681d9:00503] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fca0e434520]
[e5d1ed2681d9:00503] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fca0e4889fc]
[e5d1ed2681d9:00503] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fca0e434476]
[e5d1ed2681d9:00503] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fca0e41a7f3]
[e5d1ed2681d9:00503] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7fc951876b9e]
[e5d1ed2681d9:00503] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7fc95188220c]
[e5d1ed2681d9:00503] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9)[0x7fc9518811e9]
[e5d1ed2681d9:00503] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x99)[0x7fc951881959]
[e5d1ed2681d9:00503] [ 8] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(+0x16884)[0x7fca0cb3e884]
[e5d1ed2681d9:00503] [ 9] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_RaiseException+0x311)[0x7fca0cb3ef41]
[e5d1ed2681d9:00503] [10] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__cxa_throw+0x3b)[0x7fc9518824cb]
[e5d1ed2681d9:00503] [11] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x55596)[0x7fc8b702a596]
[e5d1ed2681d9:00503] [12] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x42c622)[0x7fc8b7401622]
[e5d1ed2681d9:00503] [13] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xb1545)[0x7fc8b7086545]
[e5d1ed2681d9:00503] [14] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xc21f9)[0x7fc8b70971f9]
[e5d1ed2681d9:00503] [15] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xcc9dd)[0x7fc8b70a19dd]
[e5d1ed2681d9:00503] [16] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xcec6a)[0x7fc8b70a3c6a]
[e5d1ed2681d9:00503] [17] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins18GPTAttentionPlugin7enqueueEPKN8nvinfer116PluginTensorDescES5_PKPKvPKPvSA_P11CUstream_st+0xbd)[0x7fc8b709c7ad]
[e5d1ed2681d9:00503] [18] /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10cdba9)[0x7fc9116b6ba9]
[e5d1ed2681d9:00503] [19] /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a36af)[0x7fc91168c6af]
[e5d1ed2681d9:00503] [20] /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a5320)[0x7fc91168e320]
[e5d1ed2681d9:00503] [21] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(_ZN12tensorrt_llm7runtime10GptSession21executeGenerationStepEiRKSt6vectorINS0_15GenerationInputESaIS3_EERS2_INS0_16GenerationOutputESaIS8_EERKS2_IiSaIiEEPNS_13batch_manager16kv_cache_manager14KVCacheManagerERS2_IbSaIbEE+0x76f)[0x7fc8da710d2f]
[e5d1ed2681d9:00503] [22] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(_ZN12tensorrt_llm7runtime10GptSession15generateBatchedERSt6vectorINS0_16GenerationOutputESaIS3_EERKS2_INS0_15GenerationInputESaIS7_EERKNS0_14SamplingConfigERKSt8functionIFvibEE+0xc3e)[0x7fc8da71261e]
[e5d1ed2681d9:00503] [23] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(_ZN12tensorrt_llm7runtime10GptSession8generateERNS0_16GenerationOutputERKNS0_15GenerationInputERKNS0_14SamplingConfigE+0xc21)[0x7fc8da7137e1]
[e5d1ed2681d9:00503] [24] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xcd949)[0x7fc8da6aa949]
[e5d1ed2681d9:00503] [25] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb4bc7)[0x7fc8da691bc7]
[e5d1ed2681d9:00503] [26] python3(+0x15fe0e)[0x5566a423be0e]
[e5d1ed2681d9:00503] [27] python3(_PyObject_MakeTpCall+0x25b)[0x5566a42325eb]
[e5d1ed2681d9:00503] [28] python3(+0x16e7bb)[0x5566a424a7bb]
[e5d1ed2681d9:00503] [29] python3(_PyEval_EvalFrameDefault+0x6152)[0x5566a422a8a2]
[e5d1ed2681d9:00503] *** End of error message ***
Aborted (core dumped)
I am also guessing, a similar issue #58 is also raised.
Device specs
How to reproduce it
After I ran all the given commands to build the docker image, I went inside the container and ran the
llama.py
insideexamples/
folder. And here is the error logs:I am also guessing, a similar issue #58 is also raised.