nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
333 stars 124 forks source link

[BUG]: vdb_upload example pipeline triggers an internal error in Triton #1649

Open dagardner-nv opened 4 months ago

dagardner-nv commented 4 months ago

Version

24.03

Which installation method(s) does this occur on?

Source

Describe the bug.

This appears to be triggered based on input RSS data, as setting different values for --stop_after avoids the issue. Reproducible with both Triton 23.06 and 24.01

Minimum reproducible example

docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2

or

docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:24.01-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model all-MiniLM-L6-v2
python examples/llm/main.py vdb_upload pipeline --stop_after=1024

Relevant log output

Click here to see error details

2024-04-19 18:02:30.801178167 [E:onnxruntime:log, tensorrt_execution_provider.h:73 log] [2024-04-19 18:02:30   ERROR] 10: Could not find any implementation for node {ForeignNode[inner_model.embeddings.position_embeddings.weight.../Transpose_4]}.
2024-04-19 18:02:30.816268404 [E:onnxruntime:log, tensorrt_execution_provider.h:73 log] [2024-04-19 18:02:30   ERROR] 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[inner_model.embeddings.position_embeddings.weight.../Transpose_4]}.)
Signal (11) received.
 0# 0x000055645A592E6D in tritonserver
 1# 0x00007FED569BF520 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007FECDDBC42B6 in /opt/tritonserver/backends/onnxruntime/libonnxruntime_providers_tensorrt.so
 3# 0x00007FECDDBC9CBB in /opt/tritonserver/backends/onnxruntime/libonnxruntime_providers_tensorrt.so
 4# 0x00007FEC3F079C15 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
 5# 0x00007FEC3F12E6BC in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
 6# 0x00007FEC3F1250A9 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
 7# 0x00007FEC3F131AA5 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
 8# 0x00007FEC3F12D802 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
 9# 0x00007FEC3F0F2BAF in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
10# 0x00007FEC3F0F5CC7 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
11# 0x00007FEC3F0F6306 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
12# 0x00007FEC3E988CB3 in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
13# 0x00007FEC3E98905A in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
14# 0x00007FEC3E91549B in /opt/tritonserver/backends/onnxruntime/libonnxruntime.so
15# 0x00007FECDFA6370D in /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
16# 0x00007FECDFA7C6FA in /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
17# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
18# 0x00007FED573A4044 in /opt/tritonserver/bin/../lib/libtritonserver.so
19# 0x00007FED573A4360 in /opt/tritonserver/bin/../lib/libtritonserver.so
20# 0x00007FED574A72E1 in /opt/tritonserver/bin/../lib/libtritonserver.so
21# 0x00007FED573A8144 in /opt/tritonserver/bin/../lib/libtritonserver.so
22# 0x00007FED56C812B3 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
23# 0x00007FED56A11B43 in /usr/lib/x86_64-linux-gnu/libc.so.6
24# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Full env printout

Click here to see environment details

 [Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct