triple-Mu / YOLOv8-TensorRT

YOLOv8 using TensorRT accelerate !
MIT License
1.32k stars 228 forks source link

The NMS is not exported in "export-seg.py" #184

Closed chenxinfeng4 closed 9 months ago

chenxinfeng4 commented 9 months ago

I didn't see the NMS and topK working in export-seg.py. Why the NMS works in det but not seg?

python3 export-seg.py \
--weights yolov8n-seg.pt \
--opset 11 \
--sim \
--input-shape 1 3 640 640 \
--device cuda:0

polygraphy inspect model yolov8n-seg.onnx
[I] Loading model: yolov8n-seg.onnx
[I] ==== ONNX Model ====
    Name: torch-jit-export | ONNX Opset: 11

    ---- 1 Graph Input(s) ----
    {images [dtype=float32, shape=(1, 3, 640, 640)]}

    ---- 2 Graph Output(s) ----
    {outputs [dtype=float32, shape=(1, 8400, 38)],
     proto [dtype=float32, shape=(1, 32, 25600)]}

    ---- 168 Initializer(s) ----

    ---- 272 Node(s) ----

我在用 yolo-seg 替换 mask-rcnn的项目,用tensorrt加速。大佬的export 功能在det上很好使。seg 里面的 outputs [dtype=float32, shape=(1, 8400, 38)]det 是一样的,为什么无法输出NMS呢?

chenxinfeng4 commented 9 months ago

是因为 nms 没法处理 mask coefficient ?

#models/common.py 
class PostSeg(nn.Module):
...
        p = self.proto(x[0])  # mask protos
        bs = p.shape[0]  # batch size
        mc = torch.cat(
            [self.cv4[i](x[i]).view(bs, self.nm, -1) for i in range(self.nl)],
            2)  # mask coefficients
        boxes, scores, labels = self.forward_det(x)
        out = torch.cat([boxes, scores, labels.float(), mc.transpose(1, 2)], 2)
        return out, p.flatten(2)
triple-Mu commented 9 months ago

因为TensorRT目前没提供关于mask的NMS插件,所以目前只是简单的做了argmax等操作

chenxinfeng4 commented 9 months ago

NMS 能否返回对应原来序列的 index?类似 argmax。

chenxinfeng4 commented 9 months ago

我尝试同时保留原始的“label&score” 和 NMS 处理之后的 “label&score (子集)” 。通过比较,找到NMS 所对应原始序列的index。这样可能会更快一些,是否合理?

$polygraphy inspect model /home/liying_lab/chenxinfeng/DATA/ultralytics/yolov8n-seg.onnx
[I] Loading model: /home/liying_lab/chenxinfeng/DATA/ultralytics/yolov8n-seg.onnx
[I] ==== ONNX Model ====
    Name: torch-jit-export | ONNX Opset: 11 | Other Opsets: {'TRT': 1}

    ---- 1 Graph Input(s) ----
    {images [dtype=float32, shape=(1, 3, 640, 640)]}

    ---- 8 Graph Output(s) ----
    {num_dets [dtype=int32, shape=('1', '1')],
     bboxes [dtype=float32, shape=('1', '100', '4')],
     scores [dtype=float32, shape=('1', '100')],
     labels [dtype=int32, shape=('1', '100')],
     scores8000 [dtype=float32, shape=(1, 8400)],
     labels8000 [dtype=int32, shape=(1, 8400)],
     maskcoeff8000 [dtype=float32, shape=(1, 8400, 32)],
     proto8000 [dtype=float32, shape=(1, 32, 160, 160)]}

    ---- 168 Initializer(s) ----

    ---- 271 Node(s) ----
triple-Mu commented 9 months ago

bboxes [dtype=float32, shape=('1', '100', '4')], scores [dtype=float32, shape=('1', '100')], labels [dtype=int32, shape=('1', '100')], scores8000 [dtype=float32, shape=(1, 8400)], labels8000 [dtype=int32, shape=(1, 8400)], maskcoeff8000 [dtype=float32, shape=(1, 8400, 32)], proto8000 [dtype=float32, shape=(1, 32, 160, 160)]}

因为NMS插件需要返回index才能操作分割,目前的话还不清楚怎么用

chenxinfeng4 commented 9 months ago

感谢解答。我这几天翻越 tensorrt 的手册,的确是 NMS 插件的问题,它不返回 index。而且 NMS 返回的 topK-scores 值会有被篡改,没法匹配到原来的scores集合中。我已经放弃挣扎了,不考虑迁移 NMS 到seg上。