marcoslucianops / DeepStream-Yolo-Seg

NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO-Segmentation models
MIT License
54 stars 12 forks source link

Output masks retrieved with mask_params.get_mask_array() very inaccurate #15

Open hhackbarth opened 12 months ago

hhackbarth commented 12 months ago

I am using a self-developed Python app to build up a GStreamer pipeline and to insert a probe to interpret the metadata from the YOLOv8 segmentation model. The app is derived from your Python example in the YOLOv8 pose repository (https://github.com/marcoslucianops/DeepStream-Yolo-Pose/blob/master/deepstream.py)

As in your pose example, I am using obj_meta.mask_params.get_mask_array(). In case of the segmentation model, this delivers the segmentation mask as an array of 160x160 float values. If it is scaled to the bounding box of the detected object and converted to a gray level mask (e.g. 0 for every tensor value below 0.5 and 255 for every tensor value >= 0.5), it can be used for drawing/masking operations. The problem is, that this mask is quite inaccurate as shown in these examples: Screenshot from 2023-12-03 18-45-32 Screenshot from 2023-12-03 18-44-48

When I use the default deepstream-app with the configuration file given in your respository (only modified to use my sample video), I can see much more accurate masks and bounding boxes, as shown in these examples: Screenshot from 2023-12-03 18-35-17 Screenshot from 2023-12-03 18-36-46

As the model is in both cases the same YOLOv8-seg, I ask myself, which part of the meta data the default deepstream-app uses for drawing the masks and bounding boxes.

Looking at a discussion at Ultralytics (https://github.com/ultralytics/ultralytics/issues/2953#issuecomment-1571939780) it looks like the segementation model originally outputs 32 "prototype" masks which have to be multiplied with the given mask weights and summed together in order to get the final mask.

So it looks like, that the mask I get with obj_meta.mask_params.get_mask_array() is only one of these prototype masks but not the final one. The question is, how to get the final one which can be seen when using the original deepstream-app. Similar question for the bounding boxes, which I retrieve in the probe using obj_meta.rect_params and which also look less accurate than those displayed by the deepstream-app.