triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.3k stars 1.48k forks source link

inference failed: PyTorch execute failure: Global alloc not supported yet #4112

Closed Michelvl92 closed 2 years ago

Michelvl92 commented 2 years ago

@deadeyegoodwin could you reopen the issue, and let me know if this is enough information?

Description When I want to run torchscript inference with triton on an exported native torchvison RetinaNet model: RETINANET_RESNET50_FPN

with an image that will enable the model to have detections, I will get back my detections. But when my model has "zero" detections, I get the following error back from the server:

inference failed: PyTorch execute failure: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Global alloc not supported yet

This is strange since testing the torchscript model directly after conversion works correctly:

print("Starting conversion of RetinaNet to torch-script")
traced = torch.jit.script(new_model)
traced.save(Path(FLAGS.output_modelfile))
print("Finished conversion of RetinaNet to torch-script")
img = torch.ones(1, 3, 720, 1280).to(device=device)
res = traced(img)

# double check after loading converted model
loaded = torch.jit.load(Path(FLAGS.output_modelfile))
img = torch.ones(1, 3, 720, 1280).to(device=device)
res = loaded(img)

Disabling JIT as suggested here does make a difference, and works: https://github.com/pytorch/pytorch/issues/69078

parameters: {
key: "ENABLE_JIT_EXECUTOR"
    value: {
    string_value:"false"
    }
}

But I do not do this in the original code where it works correctly.

Also disabling optimized execution does not help:

parameters: {
key: "DISABLE_OPTIMIZED_EXECUTION"
    value: {
    string_value:"true"
    }
}

Also does nog help.

parameters: {
key: "INFERENCE_MODE"
    value: {
    string_value:"true"
    }
}

Triton Information Using the default 22.03 Triton container:

With Nvidia RTX 3090 24GB, intel I9, and 128GB RAM.

To Reproduce Steps to reproduce the behavior.

Steps are generated with default pytorch container: nvcr.io/nvidia/pytorch:22.03-py3

Model is generated with pytroch in the default nvidia pytorch contianer as follows:

import argparse
from pathlib import Path

import torch
import torch.nn as nn
import torchvision

class Model(nn.Module):

    def __init__(self, model, max_detections=25):
        super(Model, self).__init__()
        self.model = model
        self.max_det = max_detections

    def forward(self, x):

        #x: List[torch.FloatTensor] = x.tolist()
        _, output = self.model(list(x))

        output_boxes = torch.zeros((1, self.max_det, 4),
                                   dtype=torch.float32)
        output_labels = torch.zeros((1, self.max_det),
                                    dtype=torch.int64)
        output_scores = torch.zeros((1, self.max_det),
                                    dtype=torch.float32)

        for img_idx, dict in enumerate(output):
            n_labels = len(dict['scores'])
            n_scores = len(dict['labels'])
            n_boxes = dict['boxes'].size(dim=0)
            output_labels[img_idx, :n_labels] = dict['labels'][:self.max_det]
            output_scores[img_idx, :n_scores] = dict['scores'][:self.max_det]
            output_boxes[img_idx, :n_boxes, :] = dict['boxes'][:self.max_det, :]

        return output_labels, output_scores, output_boxes

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=
        f"Export Torchvision Retinanet (retinanet_resnet50_fpn) to torchscript for Triton")

    parser.add_argument('-i',
                        '--input_modelpath',
                        type=str,
                        default=None,
                        help="Path to model that should be converted. If not provided, pre-trained COCO model will be "
                             "loaded")

    parser.add_argument('-o',
                        '--output_modelfile',
                        type=str,
                        required=True,
                        help="Exported torchscript model file")

    parser.add_argument('-c',
                        '--num-classes',
                        type=int,
                        required=True,
                        help="Number of classes that the model has")

    parser.add_argument('-m',
                        '--max-detections',
                        type=int,
                        default=25,
                        help="Number of color channels.")

    FLAGS = parser.parse_args()

    print(torch.cuda.is_available())
    device = torch.device("cuda")

    print("Loading model...")

    if FLAGS.input_modelpath:
        print("Loading model from path")
        model = torchvision.models.detection.retinanet_resnet50_fpn(
            num_classes=FLAGS.num_classes,
            pretrained=False,
            pretrained_backbone=True,
            trainable_backbone_layers=5,
        )

        model_params = torch.load(Path(FLAGS.input_modelpath), map_location=device)
        model.load_state_dict(model_params)
        model.to(device)
        model.eval()
    else:
        print("Using pretrained retinanet_resnet50_fpn COCO model")
        print("Number of classes are no. of coco classes")
        model = torchvision.models.detection.retinanet_resnet50_fpn(
            pretrained=True,
            pretrained_backbone=True,
            trainable_backbone_layers=5,
        )

    new_model = Model(model, FLAGS.max_detections).to(device)

    print("Starting conversion of RetinaNet to torch-script")
    traced = torch.jit.script(new_model)
    traced.save(Path(FLAGS.output_modelfile))
    print("Finished conversion of RetinaNet to torch-script")

Using the following config.pbtxt

name: "model_name"
platform: "pytorch_libtorch"
default_model_filename: "model_file_name.pt"
max_batch_size : 1
input [
  {
    name: "INPUT__0"
    data_type: TYPE_FP32
    dims: [3, 720, 1280]
  }
]

output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_INT64
    dims: [1, 100]
  },
  {
    name: "OUTPUT__1"
    data_type: TYPE_FP32
    dims: [1, 100]
  },
  {
    name: "OUTPUT__2"
    data_type: TYPE_FP32
    dims: [1, 100, 4]
  }
]

Expected behavior When an image is sent to the server Tensors should be returned of size:

When an "empty" image is sent, e.g. with zeros, tensors of only zeroes should be returned of size:

dyastremsky commented 2 years ago

Can you please fill out the bug report template? We need to be able to reproduce your bug to see what is happening. See template below. If there are any pieces you cannot provide, like your model, we need a sample model that reproduces the bug.

The error logs and a backtrace would also be helpful.

--

Description A clear and concise description of what the bug is.

Triton Information What version of Triton are you using?

Are you using the Triton container or did you build it yourself?

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

deadeyegoodwin commented 2 years ago

Closing, reopen with required template information.