instance segmentation with converted model on ONNX Runtime

habjoel commented 2 years ago

Hi there, I exported the Swin-S model using tools/deploy.py from MMDeploy to onnx format using tools/deploy.py (I know that it is not supported yet but it worked flawlessly nonetheless). I used the following 2 configs and they led to the same results: instance-seg_onnxruntime_dynamic.py and instance-seg_onnxruntime_static.py. Proof is shown in the two images below showing the test result of the conversion. The predictions are almost identical for pytorch and onnx. (Note that I customized the model to detect 7 classes of waste using transfer learning)

Pytorch: swin_own_31MP_pytorch_dyn

ONNX: swin_own_31MP_onnxruntime_dyn

Now, I tried to run the converted .onnx model in a ONNX runtime python script on a Nvidia Jetson AGX Xavier. However, it does not output the same results as above and performs far worse.

Here is the output of the script I wrote: bottle_swin_onnx

It seems to detect some things but not really correctly. To me it seems as if the pretrained weights were somehow lost. For clarity, I converted the model to .onnx on my laptop which uses slightly different versions of onnx and onnx runtime than the Jetson. But I doubt that this really is an issue...? Also, I doubt that the issue comes from the swin transformer model architecture as I did the same steps with the mask_rcnn_r50_fpn_1x model pretrained on COCO and tested on the mmdetection/demo/demo.jpg.

I attach the python script belowl if you want to reproduce. I copied the pre- and postprocessing steps mostly from here: https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/mask-rcnn and modified them a little to fit my model. I marked the steps I am not 100% sure about with a comment consisting of "*" above the respective step.

I would be very glad if someone could help me out!!

import numpy as np
import onnx
from onnx import numpy_helper
import onnxruntime as ort
from PIL import Image

import math

import matplotlib.pyplot as plt
import matplotlib.patches as patches

import pycocotools.mask as mask_util
import cv2
print("Imports done!")

# Check model
onnx_model = onnx.load("swin_31MP_dyn.onnx")
onnx.checker.check_model(onnx_model)
# print(onnx.helper.printable_graph(onnx_model.graph))

# Create ORT inference session
ort_session = ort.InferenceSession("swin_31MP_stat.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
ort_session.disable_fallback() #Disable fallback to ensure GPU usage
input_name = ort_session.get_inputs()[0].name #get input name of model
print("Created Inference Session")

def preprocess(image):
    # Resize
    ratio = 800.0 / min(image.size[0], image.size[1])
    image = image.resize((int(ratio * image.size[0]), int(ratio * image.size[1])), Image.BILINEAR)

    # Convert to BGR
    image = np.array(image)[:, :, [2, 1, 0]].astype('float32')

    # HWC -> CHW
    image = np.transpose(image, [2, 0, 1])

    # Normalize
    mean_vec = np.array([102.9801, 115.9465, 122.7717])
    for i in range(image.shape[0]):
        image[i, :, :] = image[i, :, :] - mean_vec[i]

    # Pad to be divisible of 32
    # import math
    padded_h = int(math.ceil(image.shape[1] / 32) * 32)
    padded_w = int(math.ceil(image.shape[2] / 32) * 32)

    padded_image = np.zeros((3, padded_h, padded_w), dtype=np.float32)
    padded_image[:, :image.shape[1], :image.shape[2]] = image
    image = padded_image

    # *********************************************************************
    # Add dimension for mmdet network 
    # (Model needs 4 input dimensions, not 3...that's why I just added an axis
    # don't know if this is the correct way to do it)
    image = image[np.newaxis,...]
    return image

img = Image.open('bottle.jpg')
print("Loaded Image")
img_data = preprocess(img)
print("Preprocessed Image")

boxes_scores, labels, masks = ort_session.run(None, {
    input_name: img_data
})
print("Ran Inference")

print('Boxes:', boxes_scores.shape)
print('Labels:', labels.shape)
print('Scores:', masks.shape)

# *********************************************************************
# Tweaking the inference outputs for postprocessing function
bbox = boxes_scores[0,:,:4] # raw bounding boxes
sco = boxes_scores[0,:,4]   # scores
lab = labels[0,:]           # raw labels
mask = np.transpose(masks, [1, 0, 2, 3])

classes = ['Can', 'Carton/Paper', 'GlassBottle', 'Other', 'PlasticBottle', 'PlasticOther', 'Wrapper']

def display_objdetect_image(image, boxes, labels, scores, masks, score_threshold=0.1):
    # Resize boxes
    ratio = 800.0 / min(image.size[0], image.size[1])
    boxes /= ratio

    _, ax = plt.subplots(1, figsize=(12,9))

    image = np.array(image)

    for mask, box, label, score in zip(masks, boxes, labels, scores):
        # Showing boxes with score > 0.7
        if score <= score_threshold:
            continue

        # Finding contour based on mask
        mask = mask[0, :, :, None]
        int_box = [int(i) for i in box]
        mask = cv2.resize(mask, (int_box[2]-int_box[0]+1, int_box[3]-int_box[1]+1))
        mask = mask > 0.5
        im_mask = np.zeros((image.shape[0], image.shape[1]), dtype=np.uint8)
        x_0 = max(int_box[0], 0)
        x_1 = min(int_box[2] + 1, image.shape[1])
        y_0 = max(int_box[1], 0)
        y_1 = min(int_box[3] + 1, image.shape[0])
        mask_y_0 = max(y_0 - box[1], 0)
        mask_y_1 = mask_y_0 + y_1 - y_0
        mask_x_0 = max(x_0 - box[0], 0)
        mask_x_1 = mask_x_0 + x_1 - x_0
        im_mask[y_0:y_1, x_0:x_1] = mask[
            mask_y_0 : mask_y_1, mask_x_0 : mask_x_1
        ]
        im_mask = im_mask[:, :, None]

        # OpenCV version 4.x
        contours, hierarchy = cv2.findContours(
            im_mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
        )

        image = cv2.drawContours(image, contours, -1, 25, 3)

        rect = patches.Rectangle((box[0], box[1]), box[2] - box[0], box[3] - box[1], linewidth=1, edgecolor='b', facecolor='none')
        ax.annotate(classes[label] + ':' + str(np.round(score, 2)), (box[0], box[1]), color='w', fontsize=12)
        # ax.annotate(str(label) + ':' + str(np.round(score, 2)), (box[0], box[1]), color='w', fontsize=12)
        ax.add_patch(rect)

    ax.imshow(image)
    plt.show()

# Execute postprocessing function to visualize bounding boxes and masks 
display_objdetect_image(img, bbox, lab, sco, mask)
print("Success")

grimoire commented 2 years ago

MMDetection would convert image to rgb like this. And you have convert the image to bgr (image = np.array(image)[:, :, [2, 1, 0]].astype('float32')). I guess it would change the data distribution.

habjoel commented 2 years ago

Thank you @grimoire. I removed the line where I convert to bgr... the output is different but still wrong... I'l try to remove the other preprocessing steps (scaling and padding) when I work on it again and come back to you with results.

habjoel commented 2 years ago

Hey @grimoire. Just wanted to confirm that the whole thing worked! Didn't realize that I also had to divide through the std at first but once I've done that it worked like a charm :)

Thanks a lot!

sarmientoj24 commented 2 years ago

@habjoel hey, just wondering how did you make it work and can you provide the snippet you used for post processing.

open-mmlab / mmdeploy

instance segmentation with converted model on ONNX Runtime #421