luxonis / depthai-core

DepthAI C++ Library
MIT License
234 stars 127 forks source link

[BUG] Watchdog is not disabled in bootloader #794

Open mrmorawski opened 1 year ago

mrmorawski commented 1 year ago

Describe the bug When operating the OAK-D PoE devices over unreliable connections, the docs recommend setting environment variables to modify the XLink watchdog timeout. However, these variables only modify the watchdog timeouts DeviceBase.cpp, but not in DeviceBootloader.cpp.

When one connects to a device over XLink, it flashes a bootloader anyway. If a device connection has high latency, it'll fail during that part of the connection regardless of whether the DEPTHAI_WATCHDOG environment was set to a more conservative value than the default.

Minimal Reproducible Example Just connect to a device via a high-latency connection (e.g. over 4G), and try to flash a bootloader. It'll fail with a connection timeout, even if you set the DEPTHAI_WATCHDOG environment variable to 0.

   import depthai as dai

    try:                     
        device_info = dai.DeviceInfo(                                                     
            name=camera_ip,                                                               
            mxid="",                                                                      
            state=dai.XLinkDeviceState.X_LINK_FLASH_BOOTED,                               
            protocol=dai.XLinkProtocol.X_LINK_TCP_IP,                                     
            platform=dai.XLinkPlatform.X_LINK_MYRIAD_X,                                   
            status=dai.XLinkError_t.X_LINK_SUCCESS,                                       
        )                                                                                 
        bootloader = dai.DeviceBootloader(device_info)                                    
    except RuntimeError:                                                                  
        device_info = dai.DeviceInfo(                                                     
            name=camera_ip,                                                               
            mxid="",                                                                      
            state=dai.XLinkDeviceState.X_LINK_BOOTLOADER,                                 
            protocol=dai.XLinkProtocol.X_LINK_TCP_IP,                                     
            platform=dai.XLinkPlatform.X_LINK_MYRIAD_X,                                   
            status=dai.XLinkError_t.X_LINK_SUCCESS,                                       
        )                                                                                 
        bootloader = dai.DeviceBootloader(device_info)                                    

    progress = lambda p: print(f"Flashing progress: {p*100:.1f}%")                        
    bootloader.flash(progress, dai.Pipeline())                                                                                                   

Expected behavior The watchdog timeout should be changed also in the bootloader section of the code.

Here is an example solution that we're working with now (it only disables the watchdog, doesn't change the value): https://github.com/luxonis/depthai-core/compare/main...mrmorawski:depthai-core:no_bootloader_watchdog

Attach system log

DEPTHAI_LEVEL=debug DEPTHAI_WATCHDOG=0 poetry run python
Python 3.10.7 (main, Nov 24 2022, 19:45:47) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
> import depthai as dai
> [2023-03-18 12:18:33.686] [debug] Python bindings - version: 2.20.2.0 from  build: 2023-01-31 23:58:49 +0000
> [2023-03-18 12:18:33.686] [debug] Library information - version: 2.20.2, commit: 4ff860838726a5e8ac0cbe59128c58a8f6143c6c from 2023-01-31 22:20:03 +0200, build: 2023-01-31 23:34:39 +0000
> [2023-03-18 12:18:33.687] [debug] Initialize - finished
> >>> [2023-03-18 12:18:33.757] [debug] Resources - Archive 'depthai-bootloader-fwp-0.0.24.tar.xz' open: 1ms, archive read: 68ms
> [2023-03-18 12:18:34.049] [debug] Resources - Archive 'depthai-device-fwp-8c3d6ac1c77b0bf7f9ea6fd4d962af37663d2fbd.tar.xz' open: 1ms, archive read: 360ms
> 
> >>> info = dai.DeviceInfo("192.168.1.162")
> >>> dev = dai.Device(info)
> [2023-03-18 12:19:41.424] [debug] Found an actual device by given DeviceInfo: DeviceInfo(name=192.168.1.162, mxid=194430108152761300, X_LINK_BOOTLOADER, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS)
> [2023-03-18 12:19:41.424] [debug] Device - OpenVINO version: universal
> [2023-03-18 12:19:41.424] [warning] Watchdog disabled! In case of unclean exit, the device needs reset or power-cycle for next run
> [2023-03-18 12:19:41.424] [debug] Device - BoardConfig: {"camera":[],"emmc":null,"gpio":[],"logDevicePrints":null,"logPath":null,"logSizeMax":null,"logVerbosity":null,"network":{"mtu":0,"xlinkTcpNoDelay":true},"nonExclusiveMode":false,"pcieInternalClock":null,"sysctl":[],"uart":[],"usb":{"flashBootedPid":63037,"flashBootedVid":999,"maxSpeed":4,"pid":63035,"vid":999},"usb3PhyInternalClock":null,"watchdogInitialDelayMs":null,"watchdogTimeoutMs":0} 
> libnop:
> 0000: b9 10 b9 05 81 e7 03 81 3b f6 81 e7 03 81 3d f6 04 b9 02 00 01 ba 00 00 be bb 00 bb 00 be be be
> 0020: be be be be 00 bb 00
> [2023-03-18 12:19:41.438] [debug] Searching for booted device: DeviceInfo(name=192.168.1.162, mxid=REMOVED, X_LINK_BOOTLOADER, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only
> [2023-03-18 12:19:42.591] [debug] Connected bootloader version 0.0.24
> [2023-03-18 12:19:57.595] [warning] Monitor thread (device: REMOVED [192.168.1.162]) - ping was missed, closing the device connection
> [2023-03-18 12:19:59.346] [debug] DeviceBootloader about to be closed...
> [2023-03-18 12:19:59.346] [debug] XLinkResetRemote of linkId: (0)
> [2023-03-18 12:20:00.599] [debug] DeviceBootloader closed, 1252
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> depthai.XLinkWriteError: Couldn't write data to stream: '__bootloader' (X_LINK_ERROR)

Additional context For our particular application, we're running the cameras in standalone mode, so we don't need an XLink connection to stream frames. We only need to be able to update pipelines on cameras, which I assume will be the usecase of most people running PoE devices in standalone mode.

mrmorawski commented 1 year ago

I don't know the depthai-core library well enough to submit a full PR, but the hack listed in 'expected behaviour' works well for us.

Would just mirroring the adjustable watchdog code from DeviceBase.cpp work for DeviceBootloader.cpp as well?

themarpe commented 1 year ago

Thanks for the proposed solution @mrmorawski - there is only one issue with this approach, that it is not completely "WD disabled" as BL still has its own WD being counted down. Though BL bumps its WD on any comms that it receives/sends successfully, so the occasion of it timing out is smaller.

Perhaps it could be reworked / put under DEPTHAI_BOOTLOADER_WATCHDOG variable - or in general, the WD timeout just extended further. Have you tried with the latter and what was the time which worked for you?

mrmorawski commented 1 year ago

thanks for the quick reply @themarpe . I'm aware it's not a full solution, just a quick hack that I put in to make sure it'll work at all. I'll maybe try to experiment with different timeouts next week.