tensorflow / addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Apache License 2.0
1.69k stars 611 forks source link

Implement fast-nms #671

Closed fsx950223 closed 4 years ago

fsx950223 commented 4 years ago

Describe the feature and the current behavior/state. The fast-nms algorithm has better performance than traditional nms. Described in paper. Kernel implementation may provides better performance than python implementation. Relevant information

Which API type would this fall under (layer, metric, optimizer, etc.)

Who will benefit with this feature?

Any other info. Tensorflow Implementation on colab

WindQAQ commented 4 years ago

Would like to see how fast it is compared with tf.image.non_max_suppression.

SSaishruthi commented 4 years ago

Is anyone working on this? If not, I will take a look @seanpmorgan

seanpmorgan commented 4 years ago

Is anyone working on this? If not, I will take a look @seanpmorgan

Not that I'm aware of, but @fsx950223 said they are willing to implement. If there has been no progress then should be fine to start on.

fsx950223 commented 4 years ago

I'm a little busy for now, I will try it next week. Thanks.

SSaishruthi commented 4 years ago

I will wait for the next update on this.

samikama commented 4 years ago

@fsx950223 how does this algorithm compare against existing GPU NMS implementations in TF? Do you have any idea?

fsx950223 commented 4 years ago

@fsx950223 how does this algorithm compare against existing GPU NMS implementations in TF? Do you have any idea?

Tensorflow benchmark could compare performance between them.

bhack commented 4 years ago

We have an upstream fast nms kernel in Tensorflow Lite https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/detection_postprocess.cc#L456-L463. @seanpmorgan I think this could be more in the "perimeter" of keras-cv (/cc @tanzhenyu ) as yesterday we have closed the other image processing related issues/PRs.

anshkumar commented 2 years ago

@bhack I'm not sure how fast nms kernel in Tensorflow Lite will work in normal scenario. Anyway, it's not hard to implement fast-nms.

def _area(boxlist, scope=None):
    # https://github.com/tensorflow/models/blob/831281cedfc8a4a0ad7c0c37173
    # 963fafb99da37/official/vision/detection/utils/object_detection/
    # box_list_ops.py#L48

    """Computes area of boxes.
    Args:
    boxlist: BoxList holding N boxes
    scope: name scope.
    Returns:
    a tensor with shape [N] representing box areas.
    """
    y_min, x_min, y_max, x_max = tf.split(
        value=boxlist, num_or_size_splits=4, axis=1)
    return tf.squeeze((y_max - y_min) * (x_max - x_min), [1])

def _intersection(boxlist1, boxlist2, scope=None):
    # https://github.com/tensorflow/models/blob/831281cedfc8a4a0ad7c0c37173
    # 963fafb99da37/official/vision/detection/utils/object_detection/
    # box_list_ops.py#L209

    """Compute pairwise intersection areas between boxes.
    Args:
    boxlist1: BoxList holding N boxes
    boxlist2: BoxList holding M boxes
    scope: name scope.
    Returns:
    a tensor with shape [N, M] representing pairwise intersections
    """
    y_min1, x_min1, y_max1, x_max1 = tf.split(
        value=boxlist1, num_or_size_splits=4, axis=1)
    y_min2, x_min2, y_max2, x_max2 = tf.split(
        value=boxlist2, num_or_size_splits=4, axis=1)
    all_pairs_min_ymax = tf.minimum(y_max1, tf.transpose(y_max2))
    all_pairs_max_ymin = tf.maximum(y_min1, tf.transpose(y_min2))
    intersect_heights = tf.maximum(0.0, 
        all_pairs_min_ymax - all_pairs_max_ymin)
    all_pairs_min_xmax = tf.minimum(x_max1, tf.transpose(x_max2))
    all_pairs_max_xmin = tf.maximum(x_min1, tf.transpose(x_min2))
    intersect_widths = tf.maximum(0.0, 
        all_pairs_min_xmax - all_pairs_max_xmin)
    return intersect_heights * intersect_widths

def _iou(boxlist1, boxlist2, scope=None):
    # https://github.com/tensorflow/models/blob/831281cedfc8a4a0ad7c0c37173
    # 963fafb99da37/official/vision/detection/utils/object_detection/
    # box_list_ops.py#L259

    """Computes pairwise intersection-over-union between box collections.
    Args:
    boxlist1: BoxList holding N boxes
    boxlist2: BoxList holding M boxes
    scope: name scope.
    Returns:
    a tensor with shape [N, M] representing pairwise iou scores.
    """
    intersections = _intersection(boxlist1, boxlist2)
    areas1 = _area(boxlist1)
    areas2 = _area(boxlist2)
    unions = (tf.expand_dims(areas1, 1) + tf.expand_dims(
        areas2, 0) - intersections)
    return tf.where(
        tf.equal(intersections, 0.0),
        tf.zeros_like(intersections), tf.truediv(intersections, unions))

def _cc_fast_nms(self, boxes, masks, scores, iou_threshold:float=0.5, top_k:int=15):
        # Cross Class NMS
        # Collapse all the classes into 1 
        classes = tf.argmax(scores, axis=-1)+1
        scores = tf.reduce_max(scores, axis=-1)
        _, idx = tf.math.top_k(scores, k=tf.math.minimum(top_k, tf.shape(scores)[0]))
        boxes_idx = tf.gather(boxes, idx, axis=0)

        # Compute the pairwise IoU between the boxes
        iou = _iou(boxes_idx, boxes_idx)

        # Zero out the lower triangle of the cosine similarity matrix and diagonal
        iou = tf.linalg.band_part(iou, 0, -1) - tf.linalg.band_part(iou, 0, 0)

        # Now that everything in the diagonal and below is zeroed out, if we take the max
        # of the IoU matrix along the columns, each column will represent the maximum IoU
        # between this element and every element with a higher score than this element.
        iou_max = tf.reduce_max(iou, axis=0)

        # Now just filter out the ones greater than the threshold, i.e., only keep boxes that
        # don't have a higher scoring box that would supress it in normal NMS.
        idx_out = idx[iou_max <= iou_threshold]

        classes = tf.cast(tf.gather_nd(classes, tf.expand_dims(idx_out, axis=-1)), dtype=tf.float32)
        boxes = tf.gather_nd(boxes, tf.expand_dims(idx_out, axis=-1))
        masks = tf.gather_nd(masks, tf.expand_dims(idx_out, axis=-1))
        scores = tf.gather_nd(scores, tf.expand_dims(idx_out, axis=-1))

        return boxes, masks, classes, scores