luxonis / depthai-core

DepthAI C++ Library
MIT License
235 stars 127 forks source link

[BUG] Inconsistent device search results after compute reboot or long time idle. #777

Open developer-mayuan opened 1 year ago

developer-mayuan commented 1 year ago

Hi Luxonis: I recently got some inconsistent results when bringing up multiple (4) usb oak-d (1st gen) camera devices in parallel (using ros nodelets) after rebooting the compute or leave cameras in idle state for a while.

The inconsistency happens in the device search stage: sometimes all my camera devices can be found successfully by providing mxids and the returned deviceInfo for each device is correct (as shown in Fig 1), successful_search while some times one or more devices couldn't be found or the device info it contains belongs to some other devices (as shown in Fig 2). unsuccessful_search

After having some discussions with EriK, he provided these suggestions to help me address the stability issue:

And I changed my camera driver to follow these ideas:

  1. Sequentially launch the driver nodelet per camera.
  2. Add delays before restarting camera driver nodelets.

With these changes, my driver node becomes more stable now. However, I still can get this bring up stability issue and I found this problem is very easy to occur under the scenario described below:

  1. Reboot the compute
  2. Query connected camera devices using an script like this: https://github.com/luxonis/depthai-core/blob/main/examples/host_side/device_information.cpp

And if I found any warning messages in the console print, the bringup issue will happen: image

If I don't launch the camera driver node but run the query script again right after the first one, I will see less devices threw warning messages: image2

And if I run the query script the 3rd time, every device will be in normal state: image3 After this health check, I can always launch my multiple camera driver nodes successfully.

themarpe commented 1 year ago

@developer-mayuan

We are yet to add some helper functions on this note, but in a nutshell, a device transitions between states when starting & stopping. In this state, a device isn't connectable & should be waited upon more, till its in a good state again.

developer-mayuan commented 1 year ago

Hi @themarpe Thanks for the quick reply! I'm wondering if there's currently a more official way to check if an OAK camera is in a good state before using it (besides querying devices like what I have done)? And I'm also curious why the camera is often if not always not in a good state after reboot the compute?

themarpe commented 1 year ago

I'm wondering if there's currently a more official way to check if an OAK camera is in a good state before using it

At the moment devices have to be queried (which you can also specify to not skip invalid devices, but filter those out yourself)

And I'm also curious why the camera is often if not always not in a good state after reboot the compute?

This is a thing we haven't explored much yet, but heard it a couple of times. I don't have any good answers on this topic at the moment unfortunatelly:/