luxonis / depthai-core

DepthAI C++ Library
MIT License
220 stars 120 forks source link

[BUG] Intermittent Issues with OAK-D W PoE Cameras Streaming Left Camera Data #1042

Open guilhermedemouraa opened 2 weeks ago

guilhermedemouraa commented 2 weeks ago

Describe the bug I'm experiencing intermittent issues with streaming data from the left camera of OAK-D W PoE cameras. The left camera data is not always present, and the device sometimes fails to be recognized upon service restart. This is the log I get from my service: Failed to resolve stream: Failed to find device after booting, error message: X_LINK_DEVICE_NOT_FOUND

Some other times, I get this log: [1844301051BEC41200] [10.95.76.11] [1718734907.447] [host] [warning] There was a fatal error. Crash dump saved to /tmp/depthai_LR3TVl/1844301051BEC41200-depthai_crash_dump.json

I'm attaching the crash reports here, but they all seem like code bugs.

Setup Details:

Streaming Configuration:

Issues Encountered:

Intermittent Left Camera Data:

Occasionally, after rebooting the PC, the left camera stream works without any issues.

Minimal Reproducible Example It's hard to add everything here, I have a gRPC server that streams the oak topics and a python gRPC client that subscribes to them. Here's how I created my dai pipeline:

void add_to_pipeline_mono(
    std::shared_ptr<dai::Pipeline> pipeline,
    std::shared_ptr<dai::node::MonoCamera> cam,
    const MonoStreamOptions &options,
    std::shared_ptr<dai::node::XLinkIn> xin_tracked_features_config) {

  auto xout_img = pipeline->create<dai::node::XLinkOut>();
  xout_img->setStreamName(std::string(options.queue_name));

  // As per depthai docs, reducing Isp3a rate (it defaults to capture fps)
  // reduces CPU load and seems to help with performance, especially when
  // performing feature tracking.
  cam->setIsp3aFps(5);
  cam->setBoardSocket(to_camera_board_socket(options.camera_board_socket));
  cam->setResolution(to_sensor_resolution(options.resolution));
  cam->setFps(options.fps);

  cam->out.link(xout_img->input); // Link raw output directly

  if (options.enabled && xin_tracked_features_config) {
    auto xout_tracked_features = pipeline->create<dai::node::XLinkOut>();
    xout_tracked_features->setStreamName(std::string(options.tracked_features_queue_name));

    // Set number of shaves and number of memory slices to maximum as per
    // depthai documentation to ensure good performance.
    auto feature_tracker = pipeline->create<dai::node::FeatureTracker>();
    feature_tracker->setHardwareResources(2, 2);

    cam->out.link(feature_tracker->inputImage);
    feature_tracker->outputFeatures.link(xout_tracked_features->input);

    xin_tracked_features_config->out.link(feature_tracker->inputConfig);
  }
}

I've tried both with and without cam->setIsp3aFps(5);.

Expected behavior I expect the left camera data to be consistently available and the device to be reliably recognized upon service restart.

Attach system log depthai_DX4bJu_1844301051BEC41200-depthai_crash_dump.json depthai_tYFDEE_18443010E147C31200-depthai_crash_dump.json depthai_Zgzmrx_1844301051BEC41200-depthai_crash_dump.json

Additional context Described above...

moratom commented 2 weeks ago

@jakaskerl would you mind checking if we can reproduce the issue?

I think this will likely fall down to a HW issue.

guilhermedemouraa commented 2 weeks ago

Sorry, I forgot to mention that I have two oaks connected to my laptop (oak0 and oak1). I have seen the same problem happening w/ both of them. Sometimes on boot I will get oak0/left but not oak1/left. Sometimes it's the other way around (I will get oak/1, but not oak/0).

The fact that it happens w/ both cameras makes me wonder if it's a hardware issue. Please let me know if there's any further information I can share to help you better understand the issue.

guilhermedemouraa commented 1 week ago

Can anyone please provide me with an update? @moratom @jakaskerl

jakaskerl commented 3 days ago

Hi @guilhermedemouraa Sorry for the late response. Hard to say what the actual issue is since the code looks ok. I'd suggest updating to the latest depthai (2.27) and to stop using cam->setIsp3aFps(5) since it introduces more issues than it solves.

Failed to resolve stream: Failed to find device after booting, error message: X_LINK_DEVICE_NOT_FOUND

This indicates power issue, perhaps do a recheck of the power source (injector/switch)? Change the source if you can.

If pipeline related, the issue is most likely caused by feature tracker. Not sure what configuration you are using, but we have had issues with it before. The docs pages states the supported resolutions are 480p and 720p, whereas you are using 800p.

Thanks, Jaka

guilhermedemouraa commented 2 days ago

Thanks for getting back to me, @jakaskerl. After some further testing, it seems that the real issue occurs when I "drop the camera" and then try to open it again. I wonder if there are recommendations/best practices for gracefully shutting down the device.

For more context, here's what I did:

I believe that somewhere in this process, the device crashed. In fact, I cannot even ping it.

Here are some logs from my service:

___$ [184430103163C41200] [10.95.76.10] [1719965092.987] [host] [warning] Device crashed, but no crash dump could be extracted.
[WARN  farm_ng_stream::events::topic_manager] Failed to resolve stream: Device already closed or disconnected: Input/output error
[INFO  farm_ng_stream::service::oak_manager] Opening camera 10.95.76.10
[ERROR farm_ng_stream::service::event_grpc] No matching topics: "oak/0/left"
[DEBUG hyper::proto::h2::server] send response error: user error: unexpected frame type
[DEBUG hyper::proto::h2::server] stream error: http2 error: user error: unexpected frame type
[WARN  farm_ng_stream::events::topic_manager] Failed to resolve stream: Cannot find any device with given deviceInfo
[INFO  farm_ng_stream::service::oak_manager] Opening camera 10.95.76.10
[ERROR farm_ng_stream::service::event_grpc] No matching topics: "oak/0/left"
[DEBUG hyper::proto::h2::server] send response error: user error: unexpected frame type

The message "Device crashed, but no crash dump could be extracted" is at least weird.

To be clear, I'm not powering it down. The power source is never touched. However my code that creates the pipeline and actively subscribes to the camera stream goes out of scope. So, to "reopen" the camera, I need to start the pipeline all over again...

I would appreciate your fast communication on this issue. We have more than 400 oak cameras at farm-ng and can't afford to have them not working properly.

guilhermedemouraa commented 2 days ago

I am also happy to set up an offline meeting to explain in greater detail any questions you may have.

jakaskerl commented 1 day ago

Hi @guilhermedemouraa Close it using device::close().

Perhaps the destructor is not properly called in the service.

/**
     * Explicitly closes connection to device.
     * @note This function does not need to be explicitly called
     * as destructor closes the device automatically
     */
    void close();

Thanks, Jaka