luxonis / depthai-core

DepthAI C++ Library
MIT License
231 stars 126 forks source link

[BUG] OAK-D-POE intermittent failure - INTERNAL_ERROR_CORE #1103

Open laurence-diack-pk opened 3 weeks ago

laurence-diack-pk commented 3 weeks ago

Problem Description OAK-D-POE cameras intermittently disappear from the network and become unreachable via ping while running. The issue occurs unpredictably:

System Details

Observed Behavior

Crash Dumps Two crash dumps have been collected, showing the following errors:

INTERNAL_ERROR_CORE RTEMS_FATAL_SOURCE_EXCEPTION

Crash dump files: crashDump_1_1844301031A3DC0E00_c7127782f2da45aac89d5b5b816d04cc45ae40be.json crashDump_0_18443010C15E9F0F00_c7127782f2da45aac89d5b5b816d04cc45ae40be.json

Pipeline The main app is C++ so I can't get a pipeline grab, but here's the basic setup:

    // Create Nodes
    auto yolospatialdetectionnetwork = pipeline.create<dai::node::YoloSpatialDetectionNetwork>();
    auto camrgb = pipeline.create<dai::node::ColorCamera>();
    auto monoleft = pipeline.create<dai::node::MonoCamera>();
    auto monoright = pipeline.create<dai::node::MonoCamera>();
    auto stereo = pipeline.create<dai::node::StereoDepth>();
    auto objecttracker = pipeline.create<dai::node::ObjectTracker>();
    auto imu = pipeline.create<dai::node::IMU>();
    auto videoenc = pipeline.create<dai::node::VideoEncoder>();
    auto videoenc_webui = pipeline.create<dai::node::VideoEncoder>();
    auto videoenc_left = pipeline.create<dai::node::VideoEncoder>();
    auto videoenc_right = pipeline.create<dai::node::VideoEncoder>();
    auto manip = pipeline.create<dai::node::ImageManip>();
    auto xoutdepth = pipeline.create<dai::node::XLinkOut>();
    auto xouttracks = pipeline.create<dai::node::XLinkOut>();
    auto xoutdetections = pipeline.create<dai::node::XLinkOut>();
    auto xoutIMU = pipeline.create<dai::node::XLinkOut>();
    auto xoutvidenc = pipeline.create<dai::node::XLinkOut>();
    auto xoutmonoenc_left = pipeline.create<dai::node::XLinkOut>();
    auto xoutmonoenc_right = pipeline.create<dai::node::XLinkOut>();
    auto xoutvideostream = pipeline.create<dai::node::XLinkOut>();

    /// Set stream names for outputs
    xouttracks->setStreamName("tracklets");
    xoutdepth->setStreamName("depth");
    xoutdetections->setStreamName("detections");
    xoutvidenc->setStreamName("vid_enc");
    xoutmonoenc_left->setStreamName("mono_enc_left");
    xoutmonoenc_right->setStreamName("mono_enc_right");
    xoutvideostream->setStreamName("videostream");
    xoutIMU->setStreamName("imu");

    // Set properties for nodes
    camrgb->setPreviewSize(416, 416);
    camrgb->setInterleaved(false);
    camrgb->setColorOrder(dai::ColorCameraProperties::ColorOrder::RGB);
    camrgb->setPreviewKeepAspectRatio(false);

    // Set RGB resolution
    if (camera_.isIMX378())
    {
        camrgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
        camrgb->setIspScale(2, 3); // Set the rgb resolution to be 2/3 of the resolution for better alignment
    }
    else
    {
        camrgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_720_P);
    }

    monoleft->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
    monoleft->setBoardSocket(dai::CameraBoardSocket::CAM_B);
    monoright->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
    monoright->setBoardSocket(dai::CameraBoardSocket::CAM_C);

    camrgb->setFps(fps);
    monoleft->setFps(fps);
    monoright->setFps(fps);
    videoenc->setQuality(93);
    videoenc_left->setQuality(93);
    videoenc_right->setQuality(93);
    videoenc->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
    videoenc_left->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
    videoenc_right->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
    videoenc_webui->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::H264_BASELINE);
    videoenc_webui->setQuality(50);
    videoenc_webui->setFrameRate(fps);
    videoenc_webui->setRateControlMode(dai::VideoEncoderProperties::RateControlMode::CBR);
    auto videoenc_webui_bitrate = 500000;
    auto videoenc_webui_width = 1280;
    auto videoenc_webui_height = 720;
    videoenc_webui->setBitrate(videoenc_webui_bitrate);

    // imu settings
    imu->enableIMUSensor({dai::IMUSensor::ACCELEROMETER_RAW, dai::IMUSensor::GYROSCOPE_RAW}, 200);
    imu->setBatchReportThreshold(1);
    imu->setMaxBatchReports(10);

    // setting node configs
    stereo->setDefaultProfilePreset(dai::node::StereoDepth::PresetMode::HIGH_ACCURACY);
    stereo->setSubpixel(true);
    stereo->setLeftRightCheck(true);
    stereo->left.setQueueSize(1);
    stereo->right.setQueueSize(1);
    stereo->left.setBlocking(false);
    stereo->right.setBlocking(false);
    stereo->setDepthAlign(dai::CameraBoardSocket::CAM_A);
    stereo->setOutputSize(monoleft->getResolutionWidth(), monoleft->getResolutionHeight());
    stereo->useHomographyRectification(false);
    stereo->setConfidenceThreshold(confidence_threshold);
    auto config = stereo->initialConfig.get();
    config.postProcessing.median = dai::MedianFilter::KERNEL_5x5;
    config.postProcessing.temporalFilter.enable = true;
    config.postProcessing.spatialFilter.enable = true;
    config.postProcessing.spatialFilter.holeFillingRadius = 2;
    config.postProcessing.spatialFilter.numIterations = 1;
    config.postProcessing.thresholdFilter.minRange = 300;
    config.postProcessing.thresholdFilter.maxRange = 10000;
    config.postProcessing.decimationFilter.decimationFactor = 3;
    config.postProcessing.decimationFilter.decimationMode = dai::RawStereoDepthConfig::PostProcessing::DecimationFilter::DecimationMode::NON_ZERO_MEDIAN;

    // Set spatial mobile net settings
    yolospatialdetectionnetwork->setBlobPath(nn_path);
    // Pub names of classes
    auto nn_classes = getNNClasses(nn_config_path);
    std::unordered_map<std::string, float> detection_confidences;
    // grab default confidence vals
    try {
        detection_confidences = getConfigValue<std::unordered_map<std::string, float>>(config_, {"nn", "default_confidence"});
    } catch (const std::exception& e) {
        ROS_ERROR_STREAM("Error parsing detection confidence: " << e.what());
    }

    grover_msgs::StringArray nn_classes_msg;
    for (const auto& nn_class : nn_classes) {
        nn_classes_msg.data.push_back(nn_class);

        auto it = detection_confidences.find(nn_class);
        if (it != detection_confidences.end()) {
            m_detection_class_conf.push_back(std::make_pair(nn_class, it->second));
        } else {
            m_detection_class_conf.push_back(std::make_pair(nn_class, confidence_threshold));
        }
    }

    m_nn_classes_pub = nh_.advertise<grover_msgs::StringArray>(cam_name_ + "/nn_classes", 1, true);
    m_nn_classes_pub.publish(nn_classes_msg);

    fillNNSettings<dai::node::YoloSpatialDetectionNetwork>(nn_config_path, yolospatialdetectionnetwork);
    yolospatialdetectionnetwork->input.setBlocking(true);
    yolospatialdetectionnetwork->setBoundingBoxScaleFactor(0.5);
    yolospatialdetectionnetwork->setDepthLowerThreshold(150);
    yolospatialdetectionnetwork->setDepthUpperThreshold(15000);
    yolospatialdetectionnetwork->setIouThreshold(0.5f);

    // possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS, SHORT_TERM_IMAGELESS, SHORT_TERM_KCF
    objecttracker->setTrackerType(dai::TrackerType::ZERO_TERM_IMAGELESS);
    // take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
    objecttracker->setTrackerIdAssignmentPolicy(dai::TrackerIdAssignmentPolicy::SMALLEST_ID);

    manip->setMaxOutputFrameSize(1382400);
    manip->initialConfig.setResize(1280, 720);
    manip->initialConfig.setFrameType(dai::ImgFrame::Type::NV12);

    monoleft->out.link(stereo->left);
    monoright->out.link(stereo->right);

    camrgb->video.link(manip->inputImage);
    manip->out.link(videoenc_webui->input);
    videoenc_webui->bitstream.link(xoutvideostream->input);
    camrgb->video.link(videoenc->input);

    monoright->out.link(videoenc_right->input);
    monoleft->out.link(videoenc_left->input);
    videoenc->bitstream.link(xoutvidenc->input);
    videoenc_right->bitstream.link(xoutmonoenc_right->input);
    videoenc_left->bitstream.link(xoutmonoenc_left->input);

    stereo->depth.link(xoutdepth->input);

    imu->out.link(xoutIMU->input);

    camrgb->preview.link(yolospatialdetectionnetwork->input);
    stereo->depth.link(yolospatialdetectionnetwork->inputDepth);
    yolospatialdetectionnetwork->passthrough.link(objecttracker->inputTrackerFrame);
    yolospatialdetectionnetwork->passthrough.link(objecttracker->inputDetectionFrame);
    yolospatialdetectionnetwork->out.link(objecttracker->inputDetections);
    yolospatialdetectionnetwork->out.link(xoutdetections->input);

    objecttracker->out.link(xouttracks->input);

Any insights would be greatly appreciated, thanks

moratom commented 3 weeks ago

Thanks for the bug report @laurence-diack-pk !

Just to clarify, the disconnects happen whilst you're running the app right?

moratom commented 3 weeks ago

@SzabolcsGergely could you take a look at a crashdumps when you have a moment?

SzabolcsGergely commented 2 weeks ago

@SzabolcsGergely could you take a look at a crashdumps when you have a moment?

Crash occurred during a XLink read, in XLinkPlatformRead, reason unknown.

laurence-diack-pk commented 2 weeks ago

Thanks for the bug report @laurence-diack-pk !

Just to clarify, the disconnects happen whilst you're running the app right?

Yeah so it seems it can happen on pipeline load or also mid-run.

It doesn't seem to be a very predictable failure and I'm having a hard time reproducing it consistently - for example I am looking at an instance right now where one of two cameras has disappeared, but I had to restart the host several times to get it into this state.

Also it may well be that the crashdumps are not a 1:1 correlation with this failure, as I have observed cases where it does this and no crashdump is retrieved.

Sorry for the vagueness, it's just sorta a black box from my end - if I can't communicate with the camera over network, it's hard to tell exactly what's going on.

I was wondering if there's any additional logging I can pull of the device itself, or perhaps some way in which I could use the M8 connector to debug over uart or usb so I can get some insight into the state of the camera when it disappears like that.