triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.03k stars 1.44k forks source link

model_repository_manager.cc:1186] failed to load 'bert' version 1: Internal: failed to load model 'bert': PytorchStreamReader failed reading zip archive: failed finding central directory #7497

Open chenchunhui97 opened 1 month ago

chenchunhui97 commented 1 month ago

Description bug when deploying Macbert

Triton Information I use the official image: nvcr.io/nvidia/tritonserver:21.09-py3

NVIDIA Release 21.09 (build 27443074)

Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0805 10:04:21.726622 1 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA A800 80GB PCIe
I0805 10:04:21.726796 1 metrics.cc:290] Collecting metrics for GPU 1: NVIDIA A800 80GB PCIe
I0805 10:04:21.726811 1 metrics.cc:290] Collecting metrics for GPU 2: NVIDIA A800 80GB PCIe
I0805 10:04:21.726822 1 metrics.cc:290] Collecting metrics for GPU 3: NVIDIA A800 80GB PCIe
I0805 10:04:21.726830 1 metrics.cc:290] Collecting metrics for GPU 4: NVIDIA A800 80GB PCIe
I0805 10:04:21.726843 1 metrics.cc:290] Collecting metrics for GPU 5: NVIDIA A800 80GB PCIe
I0805 10:04:21.726851 1 metrics.cc:290] Collecting metrics for GPU 6: NVIDIA A800 80GB PCIe
I0805 10:04:21.726862 1 metrics.cc:290] Collecting metrics for GPU 7: NVIDIA A800 80GB PCIe
I0805 10:04:21.996246 1 libtorch.cc:1030] TRITONBACKEND_Initialize: pytorch
I0805 10:04:21.996272 1 libtorch.cc:1040] Triton TRITONBACKEND API version: 1.5
I0805 10:04:21.996277 1 libtorch.cc:1046] 'pytorch' TRITONBACKEND API version: 1.5
2024-08-05 10:04:22.131515: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0805 10:04:22.172488 1 tensorflow.cc:2170] TRITONBACKEND_Initialize: tensorflow
I0805 10:04:22.172516 1 tensorflow.cc:2180] Triton TRITONBACKEND API version: 1.5
I0805 10:04:22.172522 1 tensorflow.cc:2186] 'tensorflow' TRITONBACKEND API version: 1.5
I0805 10:04:22.172528 1 tensorflow.cc:2210] backend configuration:
{}
I0805 10:04:22.175720 1 onnxruntime.cc:1997] TRITONBACKEND_Initialize: onnxruntime
I0805 10:04:22.175739 1 onnxruntime.cc:2007] Triton TRITONBACKEND API version: 1.5
I0805 10:04:22.175744 1 onnxruntime.cc:2013] 'onnxruntime' TRITONBACKEND API version: 1.5
I0805 10:04:22.196491 1 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0805 10:04:22.196506 1 openvino.cc:1203] Triton TRITONBACKEND API version: 1.5
I0805 10:04:22.196510 1 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.5
I0805 10:04:24.164709 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f44f4000000' with size 268435456
I0805 10:04:24.880512 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0805 10:04:24.880559 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
I0805 10:04:24.880580 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 2 with size 67108864
I0805 10:04:24.880589 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 3 with size 67108864
I0805 10:04:24.880599 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 4 with size 67108864
I0805 10:04:24.880608 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 5 with size 67108864
I0805 10:04:24.880617 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 6 with size 67108864
I0805 10:04:24.880626 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 7 with size 67108864
I0805 10:04:29.610384 1 model_repository_manager.cc:1022] loading: bert:1
I0805 10:04:29.722095 1 libtorch.cc:1079] TRITONBACKEND_ModelInitialize: bert (version 1)
I0805 10:04:29.724444 1 libtorch.cc:219] Optimized execution is enabled
I0805 10:04:29.724468 1 libtorch.cc:236] Inference Mode is disabled
I0805 10:04:29.731584 1 libtorch.cc:1120] TRITONBACKEND_ModelInstanceInitialize: bert (device 0)
I0805 10:04:31.545078 1 libtorch.cc:1153] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0805 10:04:31.545141 1 libtorch.cc:1102] TRITONBACKEND_ModelFinalize: delete model state
E0805 10:04:31.545162 1 model_repository_manager.cc:1186] failed to load 'bert' version 1: Internal: failed to load model 'bert': PytorchStreamReader failed reading zip archive: failed finding central directory
Exception raised from valid at /opt/pytorch/pytorch/caffe2/serialize/inline_container.cc:151 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f4658f3063c in /opt/tritonserver/backends/pytorch/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7f4658efba28 in /opt/tritonserver/backends/pytorch/libc10.so)
frame #2: caffe2::serialize::PyTorchStreamReader::valid(char const*, char const*) + 0x35b (0x7f4630be3ccb in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #3: caffe2::serialize::PyTorchStreamReader::init() + 0xb2 (0x7f4630be48e2 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #4: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>) + 0x98 (0x7f4630be5848 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #5: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xde (0x7f46323d93ee in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #6: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xc2 (0x7f46323db472 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #7: torch::jit::load(std::istream&, c10::optional<c10::Device>) + 0x6a (0x7f46323db55a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)
frame #8: <unknown function> + 0x1a33d (0x7f466809833d in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #9: <unknown function> + 0x1c9e6 (0x7f466809a9e6 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #10: <unknown function> + 0x1ceb2 (0x7f466809aeb2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #11: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f466809b274 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)
frame #12: <unknown function> + 0x2e3c3f (0x7f4675356c3f in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #13: <unknown function> + 0x2e4d71 (0x7f4675357d71 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #14: <unknown function> + 0x2dc603 (0x7f467534f603 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #15: <unknown function> + 0x18baca (0x7f46751feaca in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #16: <unknown function> + 0x1998c1 (0x7f467520c8c1 in /opt/tritonserver/bin/../lib/libtritonserver.so)
frame #17: <unknown function> + 0xd6de4 (0x7f4674bc8de4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #18: <unknown function> + 0x9609 (0x7f4675046609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)
frame #19: clone + 0x43 (0x7f46748b6293 in /usr/lib/x86_64-linux-gnu/libc.so.6)

I0805 10:04:31.545333 1 server.cc:519]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0805 10:04:31.545498 1 server.cc:546]
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0805 10:04:31.545994 1 server.cc:589]
+-------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+-------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bert  | 1       | UNAVAILABLE: Internal: failed to load model 'bert': PytorchStreamReader failed reading zip archive: failed finding central directory                                                                                                                                                                                                                                                                                                                                                          |
|       |         | Exception raised from valid at /opt/pytorch/pytorch/caffe2/serialize/inline_container.cc:151 (most recent call first):                                                                                                                                                                                                                                                                                                                                                                        |
|       |         | frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f4658f3063c in /opt/tritonserver/backends/pytorch/libc10.so)                                                                                                                                                                                                                                                                                     |
|       |         | frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7f4658efba28 in /opt/tritonserver/backends/pytorch/libc10.so)                                                                                                                                                                                                                                                 |
|       |         | frame #2: caffe2::serialize::PyTorchStreamReader::valid(char const*, char const*) + 0x35b (0x7f4630be3ccb in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                                                                                                                                                                                                                                                              |
|       |         | frame #3: caffe2::serialize::PyTorchStreamReader::init() + 0xb2 (0x7f4630be48e2 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                                                                                                                                                                                                                                                                                        |
|       |         | frame #4: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>) + 0x98 (0x7f4630be5848 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                                                                                                                                                                                                                 |
|       |         | frame #5: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char |
|       |         | > > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xde (0x7f46323d93ee in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                                                                                                                                  |
|       |         | frame #6: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xc2 (0x7f46323db472 in /opt/tritonserver/backends/pytorch/libtorch_cpu.so) |
|       |         | frame #7: torch::jit::load(std::istream&, c10::optional<c10::Device>) + 0x6a (0x7f46323db55a in /opt/tritonserver/backends/pytorch/libtorch_cpu.so)                                                                                                                                                                                                                                                                                                                                           |
|       |         | frame #8: <unknown function> + 0x1a33d (0x7f466809833d in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                                                                                                                                                                                                                                                                                                            |
|       |         | frame #9: <unknown function> + 0x1c9e6 (0x7f466809a9e6 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                                                                                                                                                                                                                                                                                                            |
|       |         | frame #10: <unknown function> + 0x1ceb2 (0x7f466809aeb2 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                                                                                                                                                                                                                                                                                                           |
|       |         | frame #11: TRITONBACKEND_ModelInstanceInitialize + 0x374 (0x7f466809b274 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so)                                                                                                                                                                                                                                                                                                                                                          |
|       |         | frame #12: <unknown function> + 0x2e3c3f (0x7f4675356c3f in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                                                                                                                                                                                                                                                                                                                  |
|       |         | frame #13: <unknown function> + 0x2e4d71 (0x7f4675357d71 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                                                                                                                                                                                                                                                                                                                  |
|       |         | frame #14: <unknown function> + 0x2dc603 (0x7f467534f603 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                                                                                                                                                                                                                                                                                                                  |
|       |         | frame #15: <unknown function> + 0x18baca (0x7f46751feaca in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                                                                                                                                                                                                                                                                                                                  |
|       |         | frame #16: <unknown function> + 0x1998c1 (0x7f467520c8c1 in /opt/tritonserver/bin/../lib/libtritonserver.so)                                                                                                                                                                                                                                                                                                                                                                                  |
|       |         | frame #17: <unknown function> + 0xd6de4 (0x7f4674bc8de4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)                                                                                                                                                                                                                                                                                                                                                                                          |
|       |         | frame #18: <unknown function> + 0x9609 (0x7f4675046609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0)                                                                                                                                                                                                                                                                                                                                                                                          |
|       |         | frame #19: clone + 0x43 (0x7f46748b6293 in /usr/lib/x86_64-linux-gnu/libc.so.6)                                                                                                                                                                                                                                                                                                                                                                                                               |
+-------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0805 10:04:31.546410 1 tritonserver.cc:1836]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.14.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /models                                                                                                                                                                                |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| rate_limit                       | OFF                                                                                                                                                                                    |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                               |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                                                                               |
| cuda_memory_pool_byte_size{2}    | 67108864                                                                                                                                                                               |
| cuda_memory_pool_byte_size{3}    | 67108864                                                                                                                                                                               |
| cuda_memory_pool_byte_size{4}    | 67108864                                                                                                                                                                               |
| cuda_memory_pool_byte_size{5}    | 67108864                                                                                                                                                                               |
| cuda_memory_pool_byte_size{6}    | 67108864                                                                                                                                                                               |
| cuda_memory_pool_byte_size{7}    | 67108864                                                                                                                                                                               |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0805 10:04:31.546430 1 server.cc:249] Waiting for in-flight requests to complete.
I0805 10:04:31.546448 1 server.cc:264] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

To Reproduce

  1. generate onnx for server (with torch version 2.1.2 )
  2. launch service of bert using tritonserver image.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior launch successfully. and one more question: how to modify the service port? (from 8000,8001,8002 to other cuatomized ports)

rmccorm4 commented 1 month ago

Hi @chenchunhui97,

generate onnx for server (with torch version 2.1.2 )

If your bert model is an ONNX model, then you should be specifying the onnxruntime backend in the config.pbtxt, but from the logs it looks like pytorch is specified.

Also, I notice your triton version is ~3 years old, if you update to the latest you should be able to take advantage of autocomplete and Triton can infer the correct minimal config.pbtxt from your ONNX model as described here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#auto-generated-model-configuration