luxonis / depthai-hardware

Altium Designs for DepthAI Carrier Boards
MIT License
444 stars 118 forks source link

OAK-D Pro #114

Closed Luxonis-Brandon closed 1 year ago

Luxonis-Brandon commented 3 years ago

Already released, see product documentation here


Preorders available: OAK-D-Pro

Start with the why:

1. Mechanical design.

The mechanical design of OAK-D is limiting, with the following draw-backs:

2. Active Illumination

OAK-D was architected for applications where passive depth performs well (and can often outperform active depth; as IR from the sun can be blocked by the optics when purely-passive disparity depth is used).

There are many applications however where active depth is absolutely necessary (operation in no light, with big/blank surfaces, etc.). And there are many OAK-D customers who would like to use the power of OAK-D in these scenarios where purely-passive depth is prohibitive.

Move to the how:

The idea is that they'd be used in one of these permutations:

  1. IR laser on for all frames
  2. IR LED on for all frames
  3. IR laser on for even frames, odd frames no illumination (ambient light)
  4. IR LED on for even frames, odd frames no illumination (ambient light)
  5. IR laser on for even frames, IR LED on for odd frames
  6. IR laser and IR LED on for all frames
  7. IR laser and IR LED both off for all frames.

It is likely that mode 1 and 5 will be used the most. But enabling all the permutations above allow maximum flexibility for adapting illumination to an application (including dynamically). For example Mode 6 will likely rarely be used, but there are certain cases where having both on may be beneficial.

Move to the what:

Same image sensors and FOV of OAK-D:

Luxonis-Brandon commented 3 years ago

It's here! And everything works! image

0ut5ider commented 3 years ago

What "IR laser dot projector" will you be using?

Luxonis-Brandon commented 3 years ago

BELAGO 1.1: https://ams.com/belago1.1

ghost commented 3 years ago

Good choice, i tested it few days ago. Looks good.

diablodale commented 3 years ago

A large set of my customers need active depth. My customers had active depth since 2013 with Kinect v1, v2, and now v3. I have started work to update my solutions to use OAK. The gaps are two areas (active depth, good multi-human pose). PRO model will close the first gap. The latter is outstanding to robustly support multi at 30fps.

My customers often create things/solutions in low light->dark environments for music/art/museum/tradeshow/interactive installations, research, coursework, and innovative flashy-blinky-shiny creations that crave the dark. Historically, Kinect was the goto but Microsoft's 3rd revision has no code progress+no hw purchasable for ~1.5 years. Realsense was an option for a few years but Intel has recently deprioritized/exited the depth market. There are a few smaller players but lack progress -or- outside cost budgets.

I am very interested in OAK-D-PRO.

Luxonis-Brandon commented 3 years ago

Thanks @diablodale ! We're quite excited to get these out.

My customers often create things/solutions in low light->dark environments for music/art/museum/tradeshow/interactive installations, research, coursework, and innovative flashy-blinky-shiny creations that crave the dark.

Yes ^. We were just discussing this offline and we figured that a lot of the artistic installations that it is desirable and/or common for there to be low-light or almost no light otherwise. To make the exhibit captivating.

Good multi-human pose... is outstanding to robustly support multi at 30fps.

CC: @tersekmatija on this. In case he's seen anything.

Thanks again, Brandon

tersekmatija commented 3 years ago

Hey, so right now I think we have a few single pose models, not sure how we stand with multi-human pose models, but it's definitely a possibility. We actually have a few community members that are active in this field. If you want, you can check out our Discord (we have a dedicated #human-pose-estimation channel), as there are quite a few good resources there I think :)

Link: https://discord.gg/zN5CkquJtD

diablodale commented 2 years ago

Are you designing OAK-D-PRO hardware to deal with peer sensor interference? With both emitted dot patterns, and TOF methods...is interference. A minority of my customers will use multiple sensors with their FOV overlapping. They do this to greatly increase FOV by merging, to fill in occulusions, or to surround objects (like with 3 sensors) to merge depth/pointclouds to create 360 views.

And in all of these...will be interference with sensors. There have been hacks with dot pattern sensors...attaching tiny vibrating motors directly to the sensors which somehow leads to a sensor more succesfully "seeing" only its own dots. With TOF, I haven't seen hacks and instead the use of sync signals sent between sensors on wire. Sync signals are a widely used/known thing...just pulse it and synchro-bingo-bango everyone has a shared clock and now an API can control offsets so no TOF conflicts with another.

Luxonis-Brandon commented 2 years ago

Are you designing OAK-D-PRO hardware to deal with peer sensor interference?

Not explicitly. That said our multi-camera sync is likely accurate enough that an inherited solution already exists and is likely to be sufficient. At least for active disparity depth - so for OAK-D-Pro. (Not sure on ToF. Will need investigation.)

diablodale commented 2 years ago

I recommed you try it now in harware prototyping to see/know the behavior (and chose to not fix it), than be surprised after manufacturing and react. Get three sensor and setup at least these two scenarios image

In all dot cameras of which I have experience and/or read, the interference is substantial.

What multi-camera sync? The only thing I've seen is an attempt to match frames by timestamp. https://github.com/luxonis/depthai-experiments/blob/master/gen2-deeplabv3_depth/main.py#L66-L84 Such a solution does nothing for interference. The data is corrupt in the depth emitter/camera. It is exactly like the dot emitter failing to draw dot correctly. Without correct dots, everything breaks down.

This isn't a showstopper for me. Rather, a hardware feature to consider when 2+ active sensor have overlapping FOV.

diablodale commented 2 years ago

Like https://www.cs.unc.edu/~maimone/media/kinect_VR_2012.pdf

michaelkr commented 2 years ago

@diablodale I'm interested in how that might occur. With structured light or Lidar, yes of course, but overlapping illumination patterns in stereo systems are expected to improve the resulting depth map, not degrade it. Interested, Michael

diablodale commented 2 years ago

I'm interested also. In the left scenario I draw above, there will be a field of 3x the number of dots all in the same FOV and none of the three OAKs knows there is another OAK...or two other OAKs.

This field of dots is not readily consistent. The 3 emitters are not at equal distance from surfaces. And not at equal angles. Therefore, the dots in a single set change their relative distance between its own set...and relative to the other two sets. All from the perspective of each of the three (or 6) cameras. Creating complex MoirΓ© patterns.

If an OAK knew there were 2 other cameras...perhaps it could somehow identifing and isolating its own dots and then somehow isolating the other dots. And then ignore those other dots. That seems a lot of work and code to me. Nothing I've seen any depth sensor ever do to date.

michaelkr commented 2 years ago

I don't believe those things should matter - stereo-based systems stereo don't need to associate projected patterns with an individual sensor (or any sensor at all, for that matter). The dots are there just to provide texture in the scene in situations where there is none (uniformly-painted walls, for eg), so that stereo matching can occur - but it does not matter from what source. For example, the Intel RealSense stereo cameras (not their structured light cameras) are of a similar design and can overlap without issue.

Luxonis-Brandon commented 2 years ago

@diablodale I'm interested in how that might occur. With structured light or Lidar, yes of course, but overlapping illumination patterns in stereo systems are expected to improve the resulting depth map, not degrade it. Interested, Michael

Yes. That is what I've observed as well. A key premise of active stereo depth that's necessary to understand in this conversation is that laser dot projectors simply add information to the scene. And so when having multiple OAK-D-Pro, and so overlapping projectors, there is simply more information added to the scene. And thus the stereo depth performance from all cameras actually does improve the scene. And because of the realistic placement of devices, it's (practically-speaking) impossible to result in information being removed from the scene. We have tested this and confirmed it as well in experimental settings.

So @diablodale - the ir laser dot projectors can be thought of like a can of spray paint. But instead of having to go spray-paint texture onto all your blank walls, physically. The IR laser dot projectors do this with IR - adding visual texture, visual interest - to the scene. Making blank walls and blank surfaces have a visual interest - and features for feature matching.

Below is an example that shows how/why this is needed for stereo depth: image

Notice that you can't even tell that there are two images when looking at the back white wall there.

The laser dot projector literally just adds texture to the wall. So that then you would be able to see that there are two images. You'd be able to match them.

And here is syncing multiple cameras: https://github.com/luxonis/depthai-experiments/tree/master/gen2-seq-num-sync#sync-multiple-devices

Notice that the grayscale-sync is what is necessary here. And at least in that example the grayscale sensors (and their triggering of IR laser projection) are in-sync as at least as far as the timer on the screen's granularity. Which is milliseconds.

So the multi-camera-sync in this case is able to get the two cameras (4 grayscale sensors) to within 1 millisecond of each-other. And note that the color camera does not have hardware sync, and is rolling shutter, so it is within 10 milliseconds.

image

When I mentioned syncing multiple active-stereo depth cameras. I was mentioning synching the emitters so they are active during the same time. The above may be good enough. But we have not explicitly tested. As @michaelkr mentioned, the worst-case here is that the depth quality actually remains the same as if only a single camera were used. But most likely, when multiple cameras are used, the overlapping of the textures they are super-imposing will improve the depth quality for both.

Thanks, Brandon

Luxonis-Brandon commented 2 years ago

I don't believe those things should matter - stereo-based systems stereo don't need to associate projected patterns with an individual sensor (or any sensor at all, for that matter). The dots are there just to provide texture in the scene in situations where there is none (uniformly-painted walls, for eg), so that stereo matching can occur - but it does not matter from what source. For example, the Intel RealSense stereo cameras (not their structured light cameras) are of a similar design and can overlap without issue.

Well said. Agreed. Same observations here.

diablodale commented 2 years ago

Thanks, I now get the dots interference thinking πŸ‘ This was a key distinction for me to get between how depth data will be derived with OAK-D-Pro. I had not considered that the stereo disparity approach would continue to be used with active emission rather than the single-cam dot approach like primesense or earlier Kinect models.

The emitter API lists only 6 values in the OP. I request a 7th which is Laser=off/LED=off. ❌❌In that off/off I would hope OAK-D-Pro to generate results the same as OAK-D in "normal" lighting situations.

Sync across multiple OAKs

From what I can see from sync and timestamps in DepthAI, the sync method used in the multicam sync at https://github.com/luxonis/depthai-experiments/tree/master/gen2-seq-num-sync#sync-multiple-devices uses a host CPU timestamp placed in the ImgFrame via https://github.com/luxonis/depthai-core/blob/7d76a830ffc51512adae455ec28b1150eabec513/src/pipeline/datatype/ImgFrame.cpp#L11-L13

Variations/delays that occur within the OAK sensor hardware to collect enough photons via exposure time, calculate disparity or debayer color, USB wire latency, USB controller, PCIe, OS, OAK driver, and finally DepthAI SDK will cause jitter and/or packets to arrive in slightly different times. The latency between the photons hitting the sensor -> DepthAI SDK on line 11 above will vary slightly on every packet.

I see there is getTimestampDevice() which likely has access to a monotonic clock/timestamp value applied by an OAK within the OAK hardware to the original data itself. Hopefully just after the photons are collected. True? https://github.com/luxonis/depthai-core/blob/7d76a830ffc51512adae455ec28b1150eabec513/src/pipeline/datatype/ImgFrame.cpp#L24 and https://github.com/luxonis/depthai-core/pull/174 That PR work suggests that the timestamps between the three cameras on a single OAK are being coordinated. But not that timestamps across different OAKs are being coordinated.

If that device-side monotonic clock was hardware synchronized across OAKs, then tight frame sync could be achieved across sensors...even sensors on different computers (by using PTP or very high precision NTP). In absence of hardware sync, the monotonic clock on different sensors will not themselves be the same...and hardware clocks always drift...resulting in the same variation challenge.

What's possible? πŸ€” If the hardware design of the first OAK-D-PRO will not have a cable/hardware clock sync, then what can the DepthAI SDK do to calculate latency and/or assist in clock sync? Perhaps consider how prosumer analog camera flashes work. One flash is the "master" and sends out an early burst of light which all the other flashes see. Then all the flashes have a starting point and know how long to wait and then all flash together with very high precision.

Could there be something emitted by the laser on one OAK which acts as a sync seen by the other OAKs? This is most likely a hardware/firmware feature. Perhaps that is done at startup and then all device monoclocks can be set to zero. Which then having zero in getTimestampDevice()an offset to host UTC time can be established and thereafter timestamps coordinated and drift soft-corrected.

nalzok commented 2 years ago

Is there any chance you can center the laser dot projector on the device? Maybe it's just me, but the asymmetric design doesn't seem super elegant. It would be awesome if you could move the projector right below the color camera!

doisyg commented 2 years ago

I don't believe those things should matter - stereo-based systems stereo don't need to associate projected patterns with an individual sensor (or any sensor at all, for that matter). The dots are there just to provide texture in the scene in situations where there is none (uniformly-painted walls, for eg), so that stereo matching can occur - but it does not matter from what source. For example, the Intel RealSense stereo cameras (not their structured light cameras) are of a similar design and can overlap without issue.

Same observation here , we have been using d435s with overlapping FoV for a while with no issues. To my understanding, as long as the patterns are sharp enough, they are adding more textures for the stereo matching. The only way I see it could have an adverse effect is if you have enough patterns projected to saturate the camera (or to re-use the spray analogy, if you have painted completely and uniformly your wall).

Luxonis-Brandon commented 2 years ago

Is there any chance you can center the laser dot projector on the device? Maybe it's just me, but the asymmetric design doesn't seem super elegant. It would be awesome if you could move the projector right below the color camera!

Unfortunately no. There's no room in the center: image

But more importantly, the design is thermally-symmetric. Which is the most important part. The heat-generators are intentionally located and implemented in specific locations to maximize thermal dissipation efficiency. And moving the IR laser, and IR LED would imbalance this. Reducing overall performance because of higher ambient operating temperatures as a result of high leakage current in the main IC.

Luxonis-Brandon commented 2 years ago

The emitter API lists only 6 values in the OP. I request a 7th which is Laser=off/LED=off. ❌❌In that off/off I would hope OAK-D-Pro to generate results the same as OAK-D in "normal" lighting situations.

Yes. I should have had that on there. It is indeed already a planned mode.

ghost commented 2 years ago

What stereo matching algorithm You want to use? What limitation it has? Resolution/disparity range/fps ?

Luxonis-Brandon commented 2 years ago

So it's a long answer with a LOT of details/options - but they are all here: https://docs.luxonis.com/projects/api/en/latest/components/nodes/stereo_depth/. And feel free to let us know if anything you are looking for is not there.

Luxonis-Brandon commented 2 years ago

Got some initial synced testing going. The census transform and disparity depth parameters need some tuning to make the match work well w/ the projector in the scene. But you can see the projector is EXTREMELY visible, which is the main purpose of the test.
image image image image

diablodale commented 2 years ago

Thanks, I can infer a lot from these. Some things that come to mind...

Luxonis-Brandon commented 2 years ago

Thanks.

A super small set of people might want an API to control the power of the Flood and Laser as you have in your test app. It adds some flexibity to manage the balance of photons/exposuretime/iso/aperature. We don't have aperature, so we are currently limited to iso (grainy or not) and exposuretime (blur or not). Naturally, there are power/heat considerations, ?regional regulations?,...and maybe over-tweakability.

Yes. Planned. Will be available in API. We are intrinsically eye-safe, so no matter what parameter is tweaked, the device cannot be made unsafe. We determined this today. Another way to put this is: The hardware is incapable of driving the laser at a high-enough power that it would become unsafe to eyes.

I can see artifacts in the disparity data which relate to the laser pattern. This will be a new considerion to manage.

Agreed. I meant to mention those specifically. We are adding the capability for anyone to fine-tune (at run-time) all the internals of the depth pipeline. https://github.com/luxonis/depthai-python/pull/377

And we will do this ourselves and provide defaults that work well for OAK-D Pro.

I understand this is early code and you caveat. πŸ‘ I'm hoping that later iterations of laser have same/more success than flood-only.

Agreed. It will. Also these cameras are hot-glued into the prototype currently for being able to swap in/out easy. So the tests are with a grain of salt - as the calibration fades really really easily as the device heats up and the hot glue becomes malleable. So the possibility exists that the artifacts are just because of that (as the laser dot projector and IR LED produce more heat).

This of course will not be an issue for production units.

When your team is ready, interested to see 2-3 overlapping lasers. In these pics, a single laser dot is quite large. I'm curious if overlapping lasers will merge dots to create fields of light resulting in unmatchable sameness like see in the flood-only right wall.

Yes. And also reducing the laser power will actually result in smaller dots. Which can help for tuning in with multiple over-lapping cameras.

Luxonis-Brandon commented 2 years ago

IR LED & Laser Pulsing https://youtu.be/SWDQekolM8o

This shows the exposure time of the OV9282 IR-capable global shutter grayscale cameras (yellow) and the IR LED/IR laser output (blue).

KySmith1 commented 2 years ago

@Luxonis-Brandon - This looks fantastic! Super excited for the release. Any estimates on the targeted ship date for the hardware?

Luxonis-Brandon commented 2 years ago

Sorry @KySmith1 I missed this. Likely February 2022 but just a guess right now.

gluxkind-k commented 2 years ago

It's here! And everything works! image

What is the width dimension of this (including the enclosure and without the enclosure)?

Luxonis-Brandon commented 2 years ago

Great question. I don't immediately have these - so I'll ask the team for specifics and to share them here.

simondemeule commented 2 years ago

Hi! Just a curious onlooker chiming in.

The hardware you are developing is awesome, and there is one specific use case I could see it absolutely shredding: eye-tracking. With the right software, this could possibly match the performance of multi-thousand-dollar research-grade eye-tracking devices.

Classical algorithms for eye-tracking rely on point light sources creating sharp reflections on the eye; through some geometric formulation of the problem, it is possible to recover eye position very accurately. Depending on the number of cameras and number of light sources, the problem becomes either under or over-constrained, and can be solved with different degrees of accuracy, with or without assumptions about head position. There are also likely some more modern machine-learning based approaches to this problem, but fundamentally, camera and light source count define how the problem is under or over-constrained. Two cameras and one point source is very nice, because it allows accurate inference of head position in 3D space. Most eye-trackers around this price point lack a stereo infrared camera pair; this requires the user to keep their head at a calibrated distance from the eye-tracking device. People need to refrain from moving around naturally for things to work properly, which isn't great for something like interactive art installations.

In all cases, having high resolution and framerate is important for good eye-tracking. It seems like the stereo pair you have fits the bill nicely β€” the resolution is likely high enough to get a decent patch of pixels on the eyes, and the framerate is great. Another characteristic of higher-end, stereo eye-tracking systems I've noticed is that the spacing between the pairs is often wider than that of the OAK-D. With a bit of maths it would be possible to find an optimal field-of-view and baseline-width for this task, but that would imply making a completely new hardware design, which maybe is a bit much to ask right now.

However, there is one possibly straightforward optional change that could open a lot of possibilities for this application. If it was possible to produce a variant of this that substitutes the RGB-bayer, IR-filtered IMX378 for its infrared-capable, monochromatic variant, this could open up the door to high-resolution pupillometry in dark lighting conditions β€” the ability to accurately measure pupil diameter in real-time. This can be a very interesting data point for various applications. Since pupil diameter is influenced by many psychological factors, but also by the amount of light the eye recieves, this is essentially always done in controlled lighting environments that are often dark β€” hence it makes sense to use infrared. The resolution and field of view of the stereo cameras would make it challenging to derive good pupil diameter estimates, however the central camera matches up what is seen in some research-grade setups, given it is placed close enough to the user.

And also, just a silly question coming from the aesthetics-obsessed part of me: I assume the white rectangle in the prototype between the IR laser and LED is just the PCB β€” will the production unit be black?

Luxonis-Brandon commented 2 years ago

Thanks @simondemeule ! Yes, the white part goes away in production. Here's more what final production will look like:

image

image OAK-D-S2 (passive depth and CV) is the top one, and OAK-D-Pro is the bottom one in the image.

Here's OAK-D-Pro by itself:

image

And for doing the change you propose. Yes, we can likely work with ArduCam to do some custom builds of this type. It should be relatively straightforward to do so.

Please feel free to reach out to support@luxonis.com (and link to your comment please if you don't mind) to see about the costs involved here to produce the number of devices you would like here.

Thanks again, Brandon

Luxonis-Brandon commented 2 years ago

image

sukhrajklair commented 2 years ago

Hi! Are you planning to build a POE version of this model like the OAK-D-POE?

Luxonis-Brandon commented 2 years ago

Yes see here: https://github.com/luxonis/depthai-hardware/issues/142#issuecomment-998975585

image

Luxonis-Brandon commented 2 years ago

We're adding the capability to use USB cables that can fasten to the device.
image

simondemeule commented 2 years ago

@Luxonis-Brandon awesome! I'm happy to know we might be able to make this a reality. I'll be in touch.

The production units look nice, this is great industrial design!

Luxonis-Brandon commented 2 years ago

Thanks @simondemeule ! Oh and given that you are into machines that make art, check this out done w/ OAK-D-Lite:

96a121a9b0ca2b35b00a761370d3f971db384766

https://github.com/keijiro/DepthAITestbed

With OAK-D-Pro I'm sure there will be even more interesting things that can be done. :-).

melsanharawi commented 2 years ago

Hi guys,

Seems to be a wonderful project. Since depth perception should be more accurate with this new version, do you think it could be used for 3d scanning ? It was anounced 3% depth error for the oak-d, do you have an idea of the new depth accuracy in this version ?

Thanks a lot

Erol444 commented 2 years ago

Hello @melsanharawi , so the below 3% depth error rate is when you have near perfect condition, so great lighting and good texture. When you don't have these, it can be a lot worse (completely random), since disparity matching can't do it's job. Pro having laser dot projector helps with that, and so the error rate is still below 3%. Thanks, Erik

Luxonis-Brandon commented 2 years ago

We also now have ToF for higher accuracy depth for scanning applications.

simondemeule commented 2 years ago

@Luxonis-Brandon whoa! That is huge! Is this on a different hardware platform using dedicated ToF image sensors, or are you somehow able to extract accurate-enough timing information from this hardware to create ToF depth maps?

Unrelated to this, I've been a bit slow getting in touch with ArduCam but will do shortly β€” I would really love to see this project idea come to fruition.

michaelkr commented 2 years ago

We also now have ToF for higher accuracy depth for scanning applications.

Any additional details on this? What sensor is providing this, or the resolution...seems like a large feature add, I'm fascinated!

themarpe commented 2 years ago

Hi @simondemeule and @michaelkr Check the following: https://docs.luxonis.com/projects/hardware/en/latest/#modular-cameras-designs and specifically the ToF: https://docs.luxonis.com/projects/hardware/en/latest/pages/DM0255.html

To answer your question Simon, this is separate ToF sensor, which can be used with our "FFC" lineup. We don't yet have an off the shelf product integrating it, but that might change in the future.

BZandi commented 2 years ago

However, there is one possibly straightforward optional change that could open a lot of possibilities for this application. If it was possible to produce a variant of this that substitutes the RGB-bayer, IR-filtered IMX378 for its infrared-capable, monochromatic variant, this could open up the door to high-resolution pupillometry in dark lighting conditions β€” the ability to accurately measure pupil diameter in real-time. This can be a very interesting data point for various applications. Since pupil diameter is influenced by many psychological factors, but also by the amount of light the eye recieves, this is essentially always done in controlled lighting environments that are often dark β€” hence it makes sense to use infrared. The resolution and field of view of the stereo cameras would make it challenging to derive good pupil diameter estimates, however the central camera matches up what is seen in some research-grade setups, given it is placed close enough to the user.

@simondemeule If you like to do high resolution pupillometry with a limited budget, the PupilEXT GitHub repo could also be of interest for you: https://github.com/openPupil/Open-PupilEXT

Erol444 commented 1 year ago

Already released, see product documentation here