luxonis / depthai

DepthAI Python API utilities, examples, and tutorials.
https://docs.luxonis.com
MIT License
942 stars 233 forks source link

[BUG] Super-zoomed image with certain preview image sizes #829

Closed Ghibe closed 4 months ago

Ghibe commented 2 years ago

I'm trying to crop the preview image from the camera using setCropRect in an ImageManip node (as a sort of zoom). However, it seems that there are some values of previewSize or crop coordinates that mess up the image and generate a sort of super-zoomed image. I'm using OAK-1 sensor with depthai-version: 2.17.2.0 Here is a screenshot of the output generated with a 1000x1000 preview image: 1000x1000 Here is a screenshot when the preview image is 995x1000 (as expected is a bit more zoomed than the previous one): 995x1000 But here is the output generated with a 999x1000 preview image: it is super zoomed on the folders in the background. 999x1000 If I set setKeepAspectRatio(False) the issue is mitigated but still persists (it is still more zoomed than the 995x1000), and the images are distorted and not suited for my NNs. Here is the screenshot with a 999x1000 preview image: 999x1000 False The same problem happens when using setCropRect() on the ImageManip node. It happens also with a 1574x1080 image (1920 with crop 0.18), while it works as expected with a 1555x1080 (1920 with crop 0.19). The pipeline is very simple (no NN used for now): pipeline_graph

Here is the sample code I've used to show the problem, I don’t know if I’m making some mistakes somewhere or if it is expected. Thank you for your help!

import cv2
import depthai as dai

SHAPE = 384
xmin = 0.0
image_shape = (999,1000)
#The same behaviour is also visible with different parameters (e.g., image_shape = (1920, 1080) and xmin = 0.18 )

p = dai.Pipeline()
p.setOpenVINOVersion(dai.OpenVINO.VERSION_2021_3)

camRgb = p.create(dai.node.ColorCamera)
camRgb.setPreviewSize(image_shape[0], image_shape[1])
camRgb.setVideoSize(image_shape[0],image_shape[1])
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setInterleaved(False)

# zoom
zoom_manip = p.create(dai.node.ImageManip)
zoom_manip.setCropRect(xmin, 0.0, 1.0, 1.0)
zoom_manip.setMaxOutputFrameSize(image_shape[0]*image_shape[1] *3)
camRgb.preview.link(zoom_manip.inputImage)

# Crop/rescale to NN
body_det_manip = p.create(dai.node.ImageManip)
body_det_manip.initialConfig.setResize(SHAPE,SHAPE)
body_det_manip.initialConfig.setKeepAspectRatio(True)
zoom_manip.out.link(body_det_manip.inputImage)

# Send rgb frames to the host
rgb_xout = p.create(dai.node.XLinkOut)
rgb_xout.setStreamName("rgb")
body_det_manip.out.link(rgb_xout.input)

# Pipeline is defined, now we can connect to the device
with dai.Device(p) as device:
    qRgb = device.getOutputQueue(name="rgb", maxSize=4, blocking=False)
    shape = (3, SHAPE, SHAPE)

    while True:
        inRgb = qRgb.get()
        frame = inRgb.getCvFrame()
        cv2.imshow("rgb", frame)
        if cv2.waitKey(1) == ord('q'):
            break
themarpe commented 2 years ago

Hi @Ghibe

Sorry for the delay, and thanks for the report.

We've seen such issue on our end as well a while ago.

Not having the setKeepAspectRatio(False) was supposedly the root cause, but here something else might be at play.

I'll test this out this week and see about a possible fix for it.

themarpe commented 1 year ago

@Ghibe

The zoom_manip takes in video output. Those are NV12 encoded. NV12 doesn't usual means of encoding odd width images, which is why the corruption happens. If you restrict the resize to odd numbers you should be fine.

Also, rather ingest isp output as its faster to process (its YUV420 format), compared to video (NV12). And use latest release or perhaps develop - has some ImageManip related fixes in.

Ghibe commented 1 year ago

Thank you for your reply! I'll try sticking to odd numbers for the images, but I've already seen that the problem still appears sometimes with odd width images (e.g., using 998x1000 images or 1920x1078) but it seems that sticking to images with sizes divisible by 10 solves the problem. I need to use the preview output as there are some neural networks in my code (not reported here). Thank you again for your help and keep up the good work!