triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.22k stars 1.47k forks source link

Failed to serve model coverted by torch2trt #5071

Closed tldrafael closed 1 year ago

tldrafael commented 1 year ago

Hi, I'm getting the following error to use the model converted by torch2trt in the Triton server: model_repository_manager.cc:1152] failed to load 'resnet50' version 1: Internal: failed to load model 'resnet50': PytorchStreamReader failed reading zip archive: failed finding central directory.

The entire log: triton.err.log.

I have seen the issues #1264 and #212. My doubt is: Am I missing some step here?

Reproducible example

I used the TensorRT docker image to generate the trt model:

docker run --rm --gpus all --ipc=host --ulimit memlock=-1 -v /home/rafael/:/home/rafael/ -it nvcr.io/nvidia/pytorch:22.01-py3

# Install the torch2trt package
git clone https://github.com/NVIDIA-AI-IOT/torch2trt && cd torch2trt && python setup.py install

Then, in the python console inside the container:

import torch
from torch2trt import torch2trt

model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True).eval().cuda()
x = torch.randn(1, 3, 224, 224).cuda()
model_trt = torch2trt(model, [x])

with open('/home/rafael/model_repository/resnet50/1/model.pt', 'wb') as f:
    f.write(model_trt.engine.serialize())

Then, I got his error when I load the triton server:

docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /home/rafael/model_repository:/models nvcr.io/nvidia/tritonserver:22.01-pyt-python-py3 tritonserver --model-repository=/models
Tabrizian commented 1 year ago

@tldrafael It seems like you are trying to load a TRT model using PyTorch backend which is incorrect. You should use tensorrt backend for TRT models. You need to use backend: "tensorrt" and save the model as model.plan file instead of model.pt.

tldrafael commented 1 year ago

thank you @Tabrizian, it worked! I added the backend: "tensorrt" in the config.pbtxt file, and changed the triton server docker image to nvcr.io/nvidia/tritonserver:22.01-py3.