stereolabs / zed-yolo

3D Object detection using Yolo and the ZED in Python and C++
https://www.stereolabs.com/
MIT License
159 stars 68 forks source link

Depth Calculation and 2.5D models #21

Closed marcn68 closed 4 years ago

marcn68 commented 5 years ago

Hi,

I am using Zed Camera with Yolo and everything is working perfectly but I have a question about the depth. Is the depth being calculated post processing with the Zed Camera or the model used by Yolo has the depth integrated in the model? Is it a basic 2D Yolo model or is it a special model for the integration with ZED camera?

And is there any 2.5D models/datasets to train or any repository that can help me with that kind of training?

adujardin commented 5 years ago

Hi,

The depth is computed from the stereoscopic images of the ZED, using the ZED SDK. It's independent of the 2D detector Yolo which has been trained typically on 2D images datasets like COCO and ImageNet. The object's depths are extracted with the 2D object positions and the computed dense depths.

I don't have in mind a specific dataset. If you want you could significantly modify the network to have input such as image and depth, in that case, you could use RGB-D datasets, simulation, depth from monocular images. Or you could directly compute the 3D objects from the stereo (or mono) images, similar to this https://arxiv.org/pdf/1909.07566.pdf

CenterNet includes 3D bbox detection on Kitti with a model (and a lot of 2D model variants, for keypoints of object detection)

marcn68 commented 5 years ago

Thank you for your answer. Now I know how it works.

marcn68 commented 5 years ago

Hello again,

I am facing some problems and I want to see if there is any solution for them:

I hope you could help with these problems.

adujardin commented 5 years ago

The depth is not guaranteed to be completely dense. Sometimes there can be holes where the object is and the 3D information is unavailable.

These holes usually appear when there are occlusions, if the object is too far or too close, if the image is saturated (too bright) or too dark or if there's not enough texture to estimate the correlation between the left and right image.

You can tweak the way the object 3D position is extracted, either by taking a bigger searching radius, or lower some thresholds. The extraction function is here : https://github.com/AlexeyAB/darknet/blob/42d08fd820335584365d393da3967853676a8c35/src/yolo_console_dll.cpp#L38-L93

christiantheriault commented 4 years ago

Hello, The depth displayed numerically (i.e. 10.2 m) corresponds to which pixel inside the yolo bounding box ? The center ?

adujardin commented 4 years ago

@christiantheriault Yes the depth corresponds to the median depth around the center (the radius is currently 10 pixels)

christiantheriault commented 4 years ago

Thank you. One more question ! I have a research project on stereo vision and distances computation. I will most likely order the Zed camera. Using depth, we can probably get the code to output the "real life" width and height of the Yolo bounding box. I mean the actual width and height of the "real" object. Right ?

Get Outlook for Androidhttps://aka.ms/ghei36


From: Aymeric Dujardin notifications@github.com Sent: Wednesday, January 29, 2020 3:32:16 AM To: stereolabs/zed-yolo zed-yolo@noreply.github.com Cc: christiantheriault christian.theriault@uqam.ca; Mention mention@noreply.github.com Subject: Re: [stereolabs/zed-yolo] Depth Calculation and 2.5D models (#21)

@christiantheriaulthttps://github.com/christiantheriault Yes the depth corresponds to the median depth around the center (the radius is currently 10 pixels)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/stereolabs/zed-yolo/issues/21?email_source=notifications&email_token=AMCL2IHPN76T7NYLP6CXOG3RAE5JBA5CNFSM4JFW4K72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKGL5LY#issuecomment-579649199, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMCL2IDOHNKYLWMR3KF5PD3RAE5JBANCNFSM4JFW4K7Q.

adujardin commented 4 years ago

Yes, that's how the object detection module in the SDK 3.0 works.

christiantheriault commented 4 years ago

That's great !!! Using the YOLO detector ?

Get Outlook for Androidhttps://aka.ms/ghei36


From: Aymeric Dujardin notifications@github.com Sent: Thursday, January 30, 2020 4:02:04 AM To: stereolabs/zed-yolo zed-yolo@noreply.github.com Cc: christiantheriault christian.theriault@uqam.ca; Mention mention@noreply.github.com Subject: Re: [stereolabs/zed-yolo] Depth Calculation and 2.5D models (#21)

Yes, that's how the object detection module in the SDK 3.0https://www.stereolabs.com/docs/object-detection/ works.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/stereolabs/zed-yolo/issues/21?email_source=notifications&email_token=AMCL2ICB3ONVWNSG26CRO5DRAKJQZA5CNFSM4JFW4K72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKKGSSQ#issuecomment-580151626, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMCL2IER7VRXG5TPYENPZU3RAKJQZANCNFSM4JFW4K7Q.

akshayklr057 commented 3 years ago

Hi @adujardin , I have a question regarding depth calculation. I have SVO files from the ZED2 camera. These files were used to export to corresponding png images (Left, Right & Depth map) using a python export file.

However, I'm not sure how to calculate the depth i.e. distance of every pixel from the camera. Currently, I'm using this formula:- *depth = baseline focal / disparity** Considering baseline = 12cm focal = 1000 pixels disparity = pixel values from the depth map. I'm getting very weird depth values. Some depth values are going to 60 meters. Attached are the corresponding images. I'm trying to calculate the depth of fish in the image. depth000001 left000001 right000001

adujardin commented 3 years ago

@akshayklr057 If you exported the depth in png then it's already the depth value in metrics (millimeters to fit the png range value). If it's the disparity, it won't work as the float value are needed and png can only store 16bit integers (0-65k). I suggest you use .exr format, natively supported by OpenCV to save the disparity values (or a numpy array).

Your formula is correct but the focal is not a fixed value, it's a calibrated parameter that depends both on the resolution and camera used. You need to get the exact value from either the calibration file (typically in /usr/local/zed/resources/) or from the API (sl.Camera.get_camera_information().calibration_parameters.left_cam.fx https://www.stereolabs.com/docs/api/python/classpyzed_1_1sl_1_1CameraParameters.html)

akshayklr057 commented 3 years ago

@adujardin did you see the images I provided above? I exported the depth to a png image using zed-examples/python/export.py using the mode=3.

I need to get the depth in metrics in python using those exported png files. Could you please guide me on how should I do that?

Right now I'm reading those depth images using the OpenCV Imread() method ->which reads png in pixel range. Considering that pixel range as disparity, I used them in the above-mentioned formula.

Or maybe you could tell me how exactly I can get the depth in python if I have only .svo files.

adujardin commented 3 years ago

I see, you saved the depth as a normalized image but you need float values. You can use the same sample with mode 4, it will output the depth in png 16bit in millimeters (the value is directly the depth in mm from 0 to 65m).

akshayklr057 commented 3 years ago

Thank you for the solution, it does work now.