hailo-ai / hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
MIT License
331 stars 46 forks source link

Do you have plan to support yolov obb model? #132

Open saurabh-git-dev opened 1 month ago

saurabh-git-dev commented 1 month ago

Is there any plan to implement any of the latest yolo obb models in the near future? Mainly Writing post-processing is not easy for everyone. So I can't move forward with that.

tan199954 commented 1 month ago

@saurabh-git-dev Have you found a solution yet?

saurabh-git-dev commented 1 month ago

@tan199954 Not yet. I've included for you more information about the issues ... https://community.hailo.ai/t/obb-model-quantization-poor-benchmark/5317 https://community.hailo.ai/t/yolob8n-obb-rotated-nms/5048

tan199954 commented 3 weeks ago

@saurabh-git-dev I used assistance from ChatGPT, and my code is now working

import numpy as np
import math

REGRESSION_LENGTH = 15
STRIDES = [8, 16, 32]
names = ['plane', 'ship', 'storage tank', 'baseball diamond', 'tennis court', 'basketball court', 
          'ground track field', 'harbor', 'bridge', 'large vehicle', 'small vehicle', 
          'helicopter', 'roundabout', 'soccer ball field', 'swimming pool']

def softmax(x):
    return np.exp(x) / np.expand_dims(np.sum(np.exp(x), axis=-1), axis=-1)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def _yolov8_obb_decoding(raw_boxes, angles, strides, image_dims, reg_max):
    boxes = None
    for box_distribute, stride, angle in zip(raw_boxes, strides, angles):
        # create grid
        shape = [int(x / stride) for x in image_dims]
        grid_x = np.arange(shape[1]) + 0.5
        grid_y = np.arange(shape[0]) + 0.5
        grid_x, grid_y = np.meshgrid(grid_x, grid_y)
        ct_row = grid_y.flatten() * stride
        ct_col = grid_x.flatten() * stride
        center = np.stack((ct_col, ct_row), axis=1)

        # box distribution to distance
        reg_range = np.arange(reg_max + 1)
        box_distribute = np.reshape(
            box_distribute, (-1, box_distribute.shape[1] * box_distribute.shape[2], 4, reg_max + 1)
        )
        box_distance = softmax(box_distribute)
        box_distance = box_distance * np.reshape(reg_range, (1, 1, 1, -1))
        box_distance = np.sum(box_distance, axis=-1)

        lt = box_distance[...,:2]
        rb = box_distance[...,2:]
        cos = np.cos(angle)
        sin = np.sin(angle)

        xf, yf = np.split((rb - lt) / 2, 2, axis=-1)       
        x = xf * cos - yf * sin
        y = xf * sin + yf * cos

        xy = np.concatenate([x, y], axis=-1)
        xywh_box = np.concatenate([xy, lt + rb], axis=-1) * stride
        xywh_box[..., :2] += np.expand_dims(center, axis=0)

        boxes = xywh_box if boxes is None else np.concatenate([boxes, xywh_box], axis=1)
    return boxes
def generate_yolo_predictions(endnodes):
    """
    endnodes is a list of 9 tensors:
        endnodes[0]:  bbox output with shapes (BS, 20, 20, 64)
        endnodes[1]:  scores output with shapes (BS, 20, 20, 80)
        endnodes[2]:  angles output with shapes (BS, 20, 20, 1)
        endnodes[3]:  bbox output with shapes (BS, 40, 40, 64)
        endnodes[4]:  scores output with shapes (BS, 40, 40, 80)
        endnodes[5]:  angles output with shapes (BS, 20, 20, 1)
        endnodes[6]:  bbox output with shapes (BS, 80, 80, 64)
        endnodes[7]:  scores output with shapes (BS, 80, 80, 80)
        endnodes[8]:  angles output with shapes (BS, 20, 20, 1)
    Returns:
        numpy.ndarray: A concatenated array of shape (BS, total_predictions, 5 + num_classes) where:
            - `total_predictions` is the sum of predictions across all scales (20x20, 40x40, 80x80).
            - Each prediction contains:
                - `4` values for the bounding box coordinates in the format [x, y, w, h].
                - `1` value representing the angle of rotation.
                - `num_classes` values for the confidence scores for each class.
    """
    image_dims = (640, 640)
    raw_boxes = endnodes[:7:3]
    angles = [np.reshape(s, (-1, s.shape[1] * s.shape[2], 1)) for s in endnodes[2::3]]
    angles = [(sigmoid(x) - 0.25) * math.pi for x in angles]
    decoded_boxes = _yolov8_obb_decoding(raw_boxes, angles, STRIDES, image_dims, REGRESSION_LENGTH)
    scores = [np.reshape(s, (-1, s.shape[1] * s.shape[2], len(names))) for s in endnodes[1:8:3]]
    scores = np.concatenate(scores, axis=1)
    angles = np.concatenate(angles, axis=1)
    return np.concatenate([decoded_boxes, scores, angles], axis=2)
saurabh-git-dev commented 3 weeks ago

@tan199954 Are you able to post-process and can see rotated detections?

I think you also need to implement the Rotated NMS.

tan199954 commented 3 weeks ago

@saurabh-git-dev i'm using the non_max_suppression function from ultralytics with torch cpu. i convert the output of the generate_yolo_predictions function to torch and then transpose to (batch_size, num_classes + 5, num_boxes)