huggingface / optimum-nvidia

Apache License 2.0
869 stars 86 forks source link

No kernel image is available for execution on the device #17

Closed Quang-elec44 closed 8 months ago

Quang-elec44 commented 9 months ago

Hi, I am currently testing with TinyLlama/TinyLlama-1.1B-Chat-v0.3 model on NVIDIA Tesla T4 and the Docker image version is 0.1.0b1. Unfortunately, there is an error when doing inference, and here is the full error log. Can you help me out ? Thanks in advance.

RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cubTempStorageSize, logProbs, (T*) nullptr, idVals, (int*) nullptr, vocabSize * batchSize, batchSize, beginOffsetBuf, offsetBuf + 1, 0, sizeof(T) * 8, stream): no kernel image is available for execution on the device (/opt/optimum-nvidia/third-party/tensorrt-llm/cpp/tensorrt_llm/kernels/samplingTopPKernels.cu:322)
1       0x7f94b0260b4b /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x7ab4b) [0x7f94b0260b4b]
2       0x7f94b03f431a void tensorrt_llm::kernels::invokeBatchTopPSampling<float>(void*, unsigned long&, unsigned long&, int**, int*, tensorrt_llm::kernels::FinishedState const*, tensorrt_llm::kernels::FinishedState*, float*, float*, float const*, int const*, int*, int*, curandStateXORWOW*, int, unsigned long, int const*, float, float const*, CUstream_st*, bool const*) + 2202
3       0x7f94b03f4394 void tensorrt_llm::kernels::invokeTopPSampling<float>(void*, unsigned long&, unsigned long&, int**, int*, tensorrt_llm::kernels::FinishedState const*, tensorrt_llm::kernels::FinishedState*, float*, float*, float const*, int const*, int*, int*, curandStateXORWOW*, int, unsigned long, int const*, float, CUstream_st*, bool const*) + 68
4       0x7f94b03b6ac4 tensorrt_llm::layers::TopPSamplingLayer<float>::allocateBuffer(unsigned long, std::vector<float, std::allocator<float> > const&) + 196
5       0x7f94b03b7704 tensorrt_llm::layers::TopPSamplingLayer<float>::setup(unsigned long, tensorrt_llm::layers::TopPSamplingLayer<float>::SetupParams const&) + 196
6       0x7f94b038d51e tensorrt_llm::layers::DynamicDecodeLayer<float>::setup(unsigned long, unsigned long, tensorrt_llm::layers::DynamicDecodeLayer<float>::SetupParams const&) + 1086
7       0x7f94b0364d52 tensorrt_llm::runtime::GptDecoder<float>::setup(tensorrt_llm::runtime::SamplingConfig const&, unsigned long, int) + 1250
8       0x7f94b0348649 tensorrt_llm::runtime::StatefulGptDecoder::newBatch(tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::GenerationOutput const&, tensorrt_llm::runtime::SamplingConfig const&) + 217
9       0x7f94b030868f tensorrt_llm::runtime::GptSession::initDecoder(tensorrt_llm::runtime::ITensor&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::GenerationOutput const&, tensorrt_llm::runtime::SamplingConfig const&, int) const + 1151
10      0x7f94b030e447 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 1271
11      0x7f94b030fc71 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 3105
12      0x7f94b02bb519 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xd5519) [0x7f94b02bb519]
13      0x7f94b029774f /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb174f) [0x7f94b029774f]
14      0x559330a80e0e /usr/bin/python(+0x15fe0e) [0x559330a80e0e]
15      0x559330a775eb _PyObject_MakeTpCall + 603
16      0x559330a8f7bb /usr/bin/python(+0x16e7bb) [0x559330a8f7bb]
17      0x559330a6f8a2 _PyEval_EvalFrameDefault + 24914
18      0x559330a8f4e1 /usr/bin/python(+0x16e4e1) [0x559330a8f4e1]
19      0x559330a6b0d1 _PyEval_EvalFrameDefault + 6529
20      0x559330a8f4e1 /usr/bin/python(+0x16e4e1) [0x559330a8f4e1]
21      0x559330a90192 PyObject_Call + 290
22      0x559330a6c2c1 _PyEval_EvalFrameDefault + 11121
23      0x559330a8170c _PyFunction_Vectorcall + 124
24      0x559330a7682d _PyObject_FastCallDictTstate + 365
25      0x559330a8c54c _PyObject_Call_Prepend + 92
26      0x559330ba51e0 /usr/bin/python(+0x2841e0) [0x559330ba51e0]
27      0x559330a775eb _PyObject_MakeTpCall + 603
28      0x559330a70908 _PyEval_EvalFrameDefault + 29112
29      0x559330b5ae56 /usr/bin/python(+0x239e56) [0x559330b5ae56]
30      0x559330b5acf6 PyEval_EvalCode + 134
31      0x559330b60b0d /usr/bin/python(+0x23fb0d) [0x559330b60b0d]
32      0x559330a81969 /usr/bin/python(+0x160969) [0x559330a81969]
33      0x559330a69e0d _PyEval_EvalFrameDefault + 1725
34      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
35      0x559330a6bed6 _PyEval_EvalFrameDefault + 10118
36      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
37      0x559330a6bed6 _PyEval_EvalFrameDefault + 10118
38      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
39      0x559330b7b13f /usr/bin/python(+0x25a13f) [0x559330b7b13f]
40      0x559330a8cf8a /usr/bin/python(+0x16bf8a) [0x559330a8cf8a]
41      0x559330a69f52 _PyEval_EvalFrameDefault + 2050
42      0x559330a8170c _PyFunction_Vectorcall + 124
43      0x559330a69e0d _PyEval_EvalFrameDefault + 1725
44      0x559330a8170c _PyFunction_Vectorcall + 124
45      0x559330a69f52 _PyEval_EvalFrameDefault + 2050
46      0x559330a8f4e1 /usr/bin/python(+0x16e4e1) [0x559330a8f4e1]
47      0x559330a90192 PyObject_Call + 290
48      0x559330a6c2c1 _PyEval_EvalFrameDefault + 11121
49      0x559330a8f4e1 /usr/bin/python(+0x16e4e1) [0x559330a8f4e1]
50      0x559330a6b0d1 _PyEval_EvalFrameDefault + 6529
51      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
52      0x559330a6bed6 _PyEval_EvalFrameDefault + 10118
53      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
54      0x559330a6bed6 _PyEval_EvalFrameDefault + 10118
55      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
56      0x559330a6bed6 _PyEval_EvalFrameDefault + 10118
57      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
58      0x559330a6bed6 _PyEval_EvalFrameDefault + 10118
59      0x559330a9e880 /usr/bin/python(+0x17d880) [0x559330a9e880]
60      0x7f957fed128e /usr/lib/python3.10/lib-dynload/_asyncio.cpython-310-x86_64-linux-gnu.so(+0x928e) [0x7f957fed128e]
61      0x7f957fed249b /usr/lib/python3.10/lib-dynload/_asyncio.cpython-310-x86_64-linux-gnu.so(+0xa49b) [0x7f957fed249b]
62      0x559330a80854 /usr/bin/python(+0x15f854) [0x559330a80854]
63      0x559330b5c8a5 /usr/bin/python(+0x23b8a5) [0x559330b5c8a5]
64      0x559330bd62a2 /usr/bin/python(+0x2b52a2) [0x559330bd62a2]
65      0x559330a7450b /usr/bin/python(+0x15350b) [0x559330a7450b]
66      0x559330a6c2c1 _PyEval_EvalFrameDefault + 11121
67      0x559330a8170c _PyFunction_Vectorcall + 124
68      0x559330a69f52 _PyEval_EvalFrameDefault + 2050
69      0x559330a8170c _PyFunction_Vectorcall + 124
70      0x559330a69f52 _PyEval_EvalFrameDefault + 2050
71      0x559330a8170c _PyFunction_Vectorcall + 124
72      0x559330a69f52 _PyEval_EvalFrameDefault + 2050
73      0x559330a8170c _PyFunction_Vectorcall + 124
74      0x559330a69f52 _PyEval_EvalFrameDefault + 2050
75      0x559330a8170c _PyFunction_Vectorcall + 124
76      0x559330a69f52 _PyEval_EvalFrameDefault + 2050
77      0x559330a8f4e1 /usr/bin/python(+0x16e4e1) [0x559330a8f4e1]
78      0x559330a6f8a2 _PyEval_EvalFrameDefault + 24914
79      0x559330b5ae56 /usr/bin/python(+0x239e56) [0x559330b5ae56]
80      0x559330b5acf6 PyEval_EvalCode + 134
81      0x559330b60b0d /usr/bin/python(+0x23fb0d) [0x559330b60b0d]
82      0x559330a81969 /usr/bin/python(+0x160969) [0x559330a81969]
83      0x559330a69e0d _PyEval_EvalFrameDefault + 1725
84      0x559330a8170c _PyFunction_Vectorcall + 124
85      0x559330a69e0d _PyEval_EvalFrameDefault + 1725
86      0x559330a8170c _PyFunction_Vectorcall + 124
87      0x559330b785dd /usr/bin/python(+0x2575dd) [0x559330b785dd]
88      0x559330b77288 Py_RunMain + 296
89      0x559330b4dcad Py_BytesMain + 45
90      0x7f95810b6d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f95810b6d90]
91      0x7f95810b6e40 __libc_start_main + 128
92      0x559330b4dba5 _start + 37
mfuntowicz commented 9 months ago

Effectively the docker image doesn't bring the support fort Nvidia T4 GPUs.

Let me include this architecture in the coming maintenance release.

mfuntowicz commented 9 months ago

Will be fixed by this PR: https://github.com/huggingface/optimum-nvidia/pull/35

mfuntowicz commented 8 months ago

@Quang-elec44 we extended support to your device (and more) in the latest release 0.1.0b2, give it a try 🤗

Quang-elec44 commented 8 months ago

@mfuntowicz Thank you so much. I'll try and report to you