[BUG] Multiple OAK-D devices not closing properly #355

Closed John-Dean closed 3 years ago

John-Dean commented 3 years ago

Describe the bug When using multiple OAK-D devices when closing the devices with USB3 the final device will not close properly (meaning you can't create a new Device from it). This is resolved by powercycling.

To Reproduce Steps to reproduce the behavior:

  1. Run script:
  2. Observe "Connected to XXXXXX" in the console twice
  3. Quit the script (press q, or CTRL+C, either results in the same issue)
  4. Re-run the script
  5. Observe "Connected to XXXXXX" in the console once - potentially followed by an error in the format: RuntimeError: Failed to find device (14442C1041A809D100-ma2480), error message: X_LINK_DEVICE_NOT_FOUND

Expected behavior Running the script multiple times works as the same every time without a need to power cycle.

Screenshots Imgur Image

Attach system log

    "architecture": "64bit WindowsPE",
    "machine": "AMD64",
    "platform": "Windows-10-10.0.19041-SP0",
    "processor": "AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD",
    "python_build": "tags/v3.9.4:1f2e308 Apr  6 2021 13:40:21",
    "python_compiler": "MSC v.1928 64 bit (AMD64)",
    "python_implementation": "CPython",
    "python_version": "3.9.4",
    "release": "10",
    "system": "Windows",
    "version": "10.0.19041",
    "win32_ver": "10 10.0.19041 SP0 Multiprocessor Free",
    "uname": "Windows Johns-New-PC 10 10.0.19041 AMD64 AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD",
    "packages": [
    "usb": [
        "No USB backend found"

Additional context The problem does not occur when usb2Mode=True

Luxonis-Brandon commented 3 years ago

Thanks for the report @John-Dean . And sorry about the trouble here. We will try to reproduce and then debug. One thing I'm pondering is if this is particular to Windows, or if this impacts all OSes. In Windows we use a different USB implementation than all other OSes (which are libusb based).

CC: @Erol444 on this.

CC: @Erol444 on this.

John-Dean commented 3 years ago

Hi Brandon,

I've tested this on Windows both with the included USB cables and with my own and found the following.

I can replicate the issue regardless of the USB port and cable used, however if I mis-match USBs (so that one is connected to the processor directly and one to the chipset (I've tested on X570 and Z170)) I can replicate this basically every time, whereas if they are both connected via the same method (i.e. both via chipset or both to processor directly) then it happens less frequently.

From this I agree this is probably a USB implementation issue on Windows.

Luxonis-Brandon commented 3 years ago

Thanks, @John-Dean ,

Very good data to have. Thanks for taking it all. And agreed, seems like this is the case the it's a USB implementation on Windows. @Erol444 and I synced offline and he's going to be digging into this this afternoon.

Thanks again, Brandon

Erol444 commented 3 years ago

Hello @John-Dean, thank you for reporting this issue. I have done some testing on Windows, however only had 1 such case out of ~50. I believe that is the case if you try to re-run the script without a bit of delay. Did you run the tests without a delay? Does this occur as well when you have 5/10sec in between the tests? One workaround would be to just wait for the amount of devices you have connected, or get IDs of all connected devices and try to "manually" connect to them one-by-one. We will continue to monitor this issue. Thanks, Erik

John-Dean commented 3 years ago

Hi Erik,

The majority of my tests were done with about 5s delay, but a few with around 60s. For me it seems to trigger consistantly and once triggered it won't go away without a power cycle.

John-Dean commented 3 years ago

Apologies, just saw the end of your reply:

One workaround would be to just wait for the amount of devices you have connected, or get IDs of all connected devices and try to "manually" connect to them one-by-one. We will continue to monitor this issue.

If I try this I get the following:

RuntimeError: Failed to find device (14442C1041A809D100-ma2480), error message: X_LINK_DEVICE_NOT_FOUND

Luxonis-Brandon commented 3 years ago

Thanks. I don't know the code well - so I'm at high risk of being wrong - but I think that actually using the ID alone is what is needed, as per here, so just "14442C1041A809D100" instead of "14442C1041A809D100-ma2480".

That said, perhaps the error message is what is appending that -ma2480 to the end, and if so - sorry about the incorrect response here. :-)

And as a heads up @Erol444 is in Europe, so he'll likely be back on early in the morning our time to reply.

John-Dean commented 3 years ago

I'm in Europe also, so no issues there!

Yes, I believe it's the error adding that - at least I think it's not in my code!

Erol444 commented 3 years ago

Really interesting. Did you follow the instructions from the docs? Otherwise I would love to see your code as well and check if it works on my machine:) Thanks!

John-Dean commented 3 years ago

I'll grab the code I am using towards the end of the day (working my day job at the moment :D).

My code is just trying to grab 2 cameras depth feeds at the moment.

Basically my end game goal is to grab the depth feed of 2/3 cameras in as close to real time and as high framerate (with API v2 I can't seem to get a depth output at >60fps, but I haven't really messed around with this either) as possible, then convert it to a point cloud.

I want to then run multiway registration ( on the pointclouds (I'm only using pointclouds here because the documentation is for pointclouds, but in the long run I want to skip this and just use depth images directly) to create a combined 3D point cloud akin to:

There are loads of applications for a real-time scanned 3D scene like this, use in fashion to preview clothes on people, use in VR (how I wish to use it) for better room scale tracking and 3D object identification and pose estimation (such as identifying and allowing you to pass through objects into VR like water bottles, tennis rackets, pets - so you don't stand on them!) and if you could drone mount the cameras with this setup you could do full 3D scans of buildings and environments with ease.

John-Dean commented 3 years ago

So after some cleanup of my code I have the following three python functions (it's very similar to the code)

(Using cv2, depthai and contextlib as imports)

The create_pipeline function creates and returns a depthai pipeline that has colour and depth outputs called rgb and depth

def create_pipeline():
    # Start defining a pipeline
    pipeline = depthai.Pipeline()

    # Define a source - color camera
    cam_rgb = pipeline.createColorCamera()
    cam_rgb.setPreviewSize(600, 600)

    # Define a source - two mono (grayscale) cameras
    left = pipeline.createMonoCamera()

    right = pipeline.createMonoCamera()


    # Define a source - depth camera
    depth = pipeline.createStereoDepth()

    # Create output
    xout_rgb = pipeline.createXLinkOut()

    xout_depth = pipeline.createXLinkOut()

    return pipeline

The get_all_cameras function takes a pipeline and returns a list of devices, a list of rgb source queues and a list of depth source queues.

def get_all_cameras(pipeline):
    devices = []

    rgb_sources = []
    depth_sources = []

    for device_info in depthai.Device.getAllAvailableDevices():
        device = depthai.Device(pipeline=pipeline, deviceDesc=device_info, usb2Mode=False)


        print("Conected to " + device_info.getMxId())
        queue_rgb = device.getOutputQueue(name="rgb", maxSize=1, blocking=False)
        queue_depth = device.getOutputQueue(name="depth", maxSize=1, blocking=False)

    return devices, rgb_sources, depth_sources

And finally the close_cameras function takes a list of devices and closes them (I was originally using the contextlib.ExitStack() approach described in the file, but I found this harder to troubleshoot which is why I swapped to this approach).

def close_cameras(devices):
    for device in devices:

Finally the wrapper code to run it all:

pipeline = create_pipeline()

devices, rgbs, depths = get_all_cameras(pipeline)

running = True
while running:  
    for i, q_rgb in enumerate(rgbs):
        in_rgb = q_rgb.tryGet()
        if in_rgb is not None:
            cv2.imshow("rgb-" + str(i + 1), in_rgb.getCvFrame())

    if cv2.waitKey(1) == ord('q'):
        running = False


In total the file looks like this:

import cv2
import depthai
import contextlib

def create_pipeline():
    # Start defining a pipeline
    pipeline = depthai.Pipeline()

    # Define a source - color camera
    cam_rgb = pipeline.createColorCamera()
    cam_rgb.setPreviewSize(600, 600)

    # Define a source - two mono (grayscale) cameras
    left = pipeline.createMonoCamera()

    right = pipeline.createMonoCamera()


    # Define a source - depth camera
    depth = pipeline.createStereoDepth()

    # Create output
    xout_rgb = pipeline.createXLinkOut()

    xout_depth = pipeline.createXLinkOut()

    return pipeline

def get_all_cameras(pipeline):
    devices = []

    rgb_sources = []
    depth_sources = []

    for device_info in depthai.Device.getAllAvailableDevices():
        device = depthai.Device(pipeline=pipeline, deviceDesc=device_info, usb2Mode=True)


        print("Conected to " + device_info.getMxId())
        # Output queue will be used to get the rgb frames from the output defined above
        queue_rgb = device.getOutputQueue(name="rgb", maxSize=1, blocking=False)
        queue_depth = device.getOutputQueue(name="depth", maxSize=1, blocking=False)

    return devices, rgb_sources, depth_sources

def close_cameras(devices):
    for device in devices:

pipeline = create_pipeline()

devices, rgbs, depths = get_all_cameras(pipeline)

running = True
while running:  
    for i, q_rgb in enumerate(rgbs):
        in_rgb = q_rgb.tryGet()
        if in_rgb is not None:
            cv2.imshow("rgb-" + str(i + 1), in_rgb.getCvFrame())

    if cv2.waitKey(1) == ord('q'):
        running = False


I can confirm that it is primarily occuring when using devices on different USB controllers. Image of issue

In this screenshot the first 3 executions were taken on my front connector USB ports (ones connected via a USB 3 header). At the red line I swap one of the cameras to use a USB directly attached to the motherboard and I have issues (the 2nd camera isn't connecting when it's run again). If I move them both to the motherboard USB then there is no issue.

Ideally they would be connected via different streams to not run into bandwidth limitations on a USB controller (this is a common issue with SteamVR basestations and tracker dongles overwhelming USB controllers), but I can accept running them this way.

Luxonis-Brandon commented 3 years ago

Thanks for the details @John-Dean . What computer are you using by the way? Perhaps we can get it so we can reproduce exactly. We are needing a new build machine anyway, so after reproducing and fixing, assuming it's a quick machine, we can repurpose it as a build machine (and if not quick, use it as an automated QA machine).

Thanks, Brandon

John-Dean commented 3 years ago

Hi Brandon,

Testing on:

Using combinations of the USB cables:

I'm using the Fractal Meshify 2 Compact case, which has 2 front panel USB 3.2Gen1 and a front type 3.2Gen2 Type C (I think the naming conventions on those are right). If I connect the both cameras using any assortments of cables to the motherboard USBs that are wired to the processor, both to the chipset USBs, both to the front panel USBs or to a front panel and a front panel USB-C I have very few issues (I would estimate similar to the 1:50 rate Erik mentioned above). If I connect so that one is via the processor and one via the chipset in any capacity (i.e. one on the motherboard USBs and one on the front panel) I have the issue almost every time, so I am leaning towards this being a chipset compatibility issue.

I'll try to update to the latest chipset drivers and a beta BIOS tomorrow and see if that fixes the issues magically.

If you're looking to build the system yourself I would advise against it:

The AM4 platform (specially X570) has USB driver issues (surprise surprise) which I suspect might be playing a role in this issue. From all the reading I have done on this it doesn't sound like the USB driver issues that plague AM4 are the root cause (since I also validated it happening on an older Intel Z170 system), but it could be why its consistent for me. The driver issues should result in slower speed or disconnects, not the inability to connect.

Also, Samsung NVME drives can cause Bluescreens while on AM4 systems, so you have to manage your Windows PCIe power management settings to be high performance to avoid the issue - something that means if you have other PCIe devices attached (i.e. network cards, GPUs) it will cause them to also get this high performance power setting which can stop some of them from idling.

Finally, my biggest issue is that the 5950x, 5900x and to a lesser extent the 5800x all have issues with WHEA errors causing processor RMAs (which annoyingly is what plagues my current system). The 'fix' is to stop all boosting/overclocking/idling functionality (so the processor is locked at stock frequencies), which is what I am currently doing, and to request an RMA. I don't think that is the cause of this issue as it just causes outright system crashes instead of driver or connected device instability.

Long story short I was using Intel Z170 as a work machine for the last 6ish years with literally 0 issues. I have a new workload that required much greater GPU and CPU resources so I went for the best in slot on both and have had significant issues since upgrading. I can't recommend the 5000 series AMD platform as stable. It's frustrating as if Intel had a competitive high core count processor I would have chosen that instead, but I am left with no option but AMD for my usecase.

Luxonis-Brandon commented 3 years ago

Thank you for all the insight here! This also makes a lot of sense as we too have seen USB issues with some AMD chipsets. I can't exactly remember the specs of the machine, but it was a custom build just like this. I'll have to look up the specs as another engineer has the system now. Actually we can probably install Windows on one of these AMD machines (or build a more modern AMD machine) and see if we can reproduce.

And sounds good on the BIOS update.

And sounds good on the BIOS update.

Thanks again, Brandon

John-Dean commented 3 years ago

Updated to the latest BIOS (including beta) and latest X570 chipset drivers and I can confirm the issue is still present. It does howerver go away now and both cameras connect after a couple of minutes of waiting - whereas before even after 30m they refused to reconnect.

From this I am guessing it is more than likely a compatiblity issue with X570 drivers and the Windows USB library you are using.

I have a open support ticket with AMD about other platform instabilities and I will raise this as an example if that's okay with you? I can't promise that it will go anywhere.

Thanks, John

Luxonis-Brandon commented 3 years ago

Thanks @John-Dean for the thorough testing and investigation and reporting here. And yes, please do, and feel free to cite this.

And thanks again for doing so, -Brandon

John-Dean commented 3 years ago

From a mixture of AMD updates and latest DepthAI updates I can't replicate this anymore - so for now I think this is safe to say is fixed, thanks for looking into this.

Luxonis-Brandon commented 3 years ago

Thanks for circling back @John-Dean ! (And great to hear!)