luxonis / XLink

A cross-platform library for communicating with devices over various physical links.
Apache License 2.0
11 stars 17 forks source link

Multi-threading fix, link IDs were reused in certain cases #73

Closed alex-luxonis closed 8 months ago

alex-luxonis commented 8 months ago

Fix getNextAvailableLinkUniqueId to always increment the next returned id. In certain cases it returned the previous link id, and even if the connection got closed (mainly by device after FW boot), there might have been queued transfers (like the blocking readback after bootMemory in DepthAI) interfering with the newly open connection on the same id.

@diablodale's comment in https://github.com/luxonis/XLink/commit/5d2d5023aff5caed817a47fae36e761347af8018 also relevant:

TODO investigate race condition that is (probably) later caught
due to changing the global xLinkDesc_t availableXLinks[MAX_LINKS]
without any thread protection. The dispatcher thread can be calling this function
while an app thread calls XLinkReadData() which calls getLinkByStreamId() which calls
both getLinkById() and getXLinkState(). The latter two read the global availableXLinks and depending on the two threads execution timing could result in the xlink being invalidated after the app's thread did the "is xlink valid" test. This leads to the app's thread
creating an xLinkEvent_t with outdated xlink info. When that event gets to the
event processing loop, the validity of the xlink state will be checked again and be handled

Fixes multi-threading failures (observed on PoE devices), and might fix some other bootup issues.

DepthAI code snippet repro:

    std::vector<std::string> devices = {
        "192.168.8.129",
        "192.168.8.130",
        "192.168.8.131",
        "192.168.8.132",
        "192.168.8.133",
    };
    std::vector<std::thread> threads;
    for (const auto& devName : devices) {
        threads.push_back(std::thread([devName](){
            printf("===== %s: thread started\n", devName.c_str());
            auto connected = dai::XLinkConnection::getAllConnectedDevices();
            dai::DeviceInfo selected;
            for (const auto& devInfo : connected) {
                if (devInfo.name == devName) {
                    selected = devInfo;
                    break;
                }
            }
            // scoped, to close bootloader
            {
                dai::DeviceBootloader bl(selected);
                auto ver = bl.getVersion().toString();
                printf("===== %s: bootloader version %s\n", devName.c_str(), ver.c_str());
            }
            dai::Device dev(selected);
            printf("===== %s: device started\n", devName.c_str());
            while(1) sleep(1);
        }));

        // usleep(100*1000); // some wait between starting threads
    }
    while(1) sleep(1);

Some local logs (depthai-core, XLink), showing one of the failures. Note the reuse of linkId 4 for two devices, successfully booting 192.168.8.130, then failing 192.168.8.129:

===== 192.168.8.129: thread started
===== 192.168.8.130: thread started
===== 192.168.8.132: thread started
===== 192.168.8.133: thread started
===== 192.168.8.131: thread started

[    367713] [rgb_preview] getNextAvailableLink:615 DBG got link id 0 (nextUniqueLinkId = 0) saved on availableXLinks[0] = 0x5576079209a0
[    367713] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.133 >>> got linkId 0

[    367721] [rgb_preview] getNextAvailableLink:615 DBG got link id 1 (nextUniqueLinkId = 1) saved on availableXLinks[1] = 0x557607939c40
[    367721] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.132 >>> got linkId 1

[    367721] [rgb_preview] getNextAvailableLink:615 DBG got link id 2 (nextUniqueLinkId = 2) saved on availableXLinks[2] = 0x557607952ee0
[    367721] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.130 >>> got linkId 2

[    367725] [rgb_preview] getNextAvailableLink:615 DBG got link id 3 (nextUniqueLinkId = 3) saved on availableXLinks[3] = 0x55760796c180
[    367725] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.129 >>> got linkId 3

[    367729] [rgb_preview] getNextAvailableLink:615 DBG got link id 4 (nextUniqueLinkId = 4) saved on availableXLinks[4] = 0x557607985420
[    367729] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.131 >>> got linkId 4

===== 192.168.8.133: bootloader version 0.0.26
[    367947] [Scheduler00Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5576079209a0 with id 0
===== 192.168.8.131: bootloader version 0.0.26
===== 192.168.8.132: bootloader version 0.0.26
===== 192.168.8.130: bootloader version 0.0.26
===== 192.168.8.129: bootloader version 0.0.26
[    367962] [Scheduler02Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x557607952ee0 with id 2
[    367962] [Scheduler04Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x557607985420 with id 4
[    367963] [Scheduler01Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x557607939c40 with id 1
[    367963] [Scheduler03Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x55760796c180 with id 3

[    377831] [rgb_preview] getNextAvailableLink:615 DBG got link id 4 (nextUniqueLinkId = 4) saved on availableXLinks[0] = 0x5576079209a0
[    377831] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.130 >>> got linkId 4
=================================== 192.168.8.130 DeviceBootloader::bootMemory start...
=================================== 192.168.8.130 DeviceBootloader::bootMemory DONE
[    378974] [Scheduler00Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5576079209a0 with id 4

[    379959] [rgb_preview] getNextAvailableLink:615 DBG got link id 4 (nextUniqueLinkId = 4) saved on availableXLinks[0] = 0x5576079209a0
[    379959] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.129 >>> got linkId 4

[    379991] [rgb_preview] getNextAvailableLink:615 DBG got link id 5 (nextUniqueLinkId = 5) saved on availableXLinks[1] = 0x557607939c40
[    379991] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.133 >>> got linkId 5

[    379999] [rgb_preview] getNextAvailableLink:615 DBG got link id 6 (nextUniqueLinkId = 6) saved on availableXLinks[2] = 0x557607952ee0
[    379999] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.132 >>> got linkId 6

=================================== 192.168.8.129 DeviceBootloader::bootMemory start...
=================================== 192.168.8.133 DeviceBootloader::bootMemory start...
=================================== 192.168.8.132 DeviceBootloader::bootMemory start...
[    380451] [rgb_preview] getNextAvailableLink:615 DBG got link id 7 (nextUniqueLinkId = 7) saved on availableXLinks[3] = 0x55760796c180
[    380451] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.131 >>> got linkId 7

XLinkStream::writeSplit -- XLinkWriteData ERR on streamId 0x4000000, 0x7f807dc95010 + 10485760, size 5242880
terminate called after throwing an instance of 'dai::XLinkWriteError'
  what():  Couldn't write data to stream: '__bootloader' (X_LINK_ERROR)

Logs after the fix:

===== 192.168.8.132: thread started
===== 192.168.8.133: thread started
===== 192.168.8.129: thread started
===== 192.168.8.130: thread started
===== 192.168.8.131: thread started

[    974233] [rgb_preview] getNextAvailableLink:615 DBG got link id 0 (nextUniqueLinkId = 1) saved on availableXLinks[0] = 0x5569246c09a0
[    974233] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.130 >>> got linkId 0

[    974233] [rgb_preview] getNextAvailableLink:615 DBG got link id 1 (nextUniqueLinkId = 2) saved on availableXLinks[1] = 0x5569246d9c40
[    974233] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.129 >>> got linkId 1

[    974241] [rgb_preview] getNextAvailableLink:615 DBG got link id 2 (nextUniqueLinkId = 3) saved on availableXLinks[2] = 0x5569246f2ee0
[    974241] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.131 >>> got linkId 2

[    974241] [rgb_preview] getNextAvailableLink:615 DBG got link id 3 (nextUniqueLinkId = 4) saved on availableXLinks[3] = 0x55692470c180
[    974241] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.133 >>> got linkId 3

[    974241] [rgb_preview] getNextAvailableLink:615 DBG got link id 4 (nextUniqueLinkId = 5) saved on availableXLinks[4] = 0x556924725420
[    974241] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.132 >>> got linkId 4

===== 192.168.8.130: bootloader version 0.0.26
===== 192.168.8.129: bootloader version 0.0.26
[    974463] [Scheduler00Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5569246d9c40 with id 1
[    974464] [Scheduler01Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5569246c09a0 with id 0
===== 192.168.8.132: bootloader version 0.0.26
===== 192.168.8.131: bootloader version 0.0.26
===== 192.168.8.133: bootloader version 0.0.26
[    974470] [Scheduler03Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x55692470c180 with id 3
[    974470] [Scheduler04Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x556924725420 with id 4
[    974471] [Scheduler02Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5569246f2ee0 with id 2

[    984329] [rgb_preview] getNextAvailableLink:615 DBG got link id 5 (nextUniqueLinkId = 6) saved on availableXLinks[0] = 0x5569246c09a0
[    984329] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.130 >>> got linkId 5
=================================== 192.168.8.130 DeviceBootloader::bootMemory start...
=================================== 192.168.8.130 DeviceBootloader::bootMemory DONE
[    985514] [Scheduler00Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5569246c09a0 with id 5

[    986445] [rgb_preview] getNextAvailableLink:615 DBG got link id 6 (nextUniqueLinkId = 7) saved on availableXLinks[0] = 0x5569246c09a0
[    986445] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.129 >>> got linkId 6

[    986471] [rgb_preview] getNextAvailableLink:615 DBG got link id 7 (nextUniqueLinkId = 8) saved on availableXLinks[1] = 0x5569246d9c40
[    986471] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.131 >>> got linkId 7

[    986491] [rgb_preview] getNextAvailableLink:615 DBG got link id 8 (nextUniqueLinkId = 9) saved on availableXLinks[2] = 0x5569246f2ee0
[    986491] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.132 >>> got linkId 8

[    986507] [rgb_preview] getNextAvailableLink:615 DBG got link id 9 (nextUniqueLinkId = 10) saved on availableXLinks[3] = 0x55692470c180
[    986507] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.133 >>> got linkId 9

=================================== 192.168.8.129 DeviceBootloader::bootMemory start...
=================================== 192.168.8.131 DeviceBootloader::bootMemory start...
=================================== 192.168.8.132 DeviceBootloader::bootMemory start...
=================================== 192.168.8.133 DeviceBootloader::bootMemory start...
=================================== 192.168.8.129 DeviceBootloader::bootMemory DONE
[    987622] [Scheduler00Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5569246c09a0 with id 6
=================================== 192.168.8.132 DeviceBootloader::bootMemory DONE
=================================== 192.168.8.133 DeviceBootloader::bootMemory DONE
[    987649] [Scheduler02Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5569246f2ee0 with id 8
[    987650] [Scheduler03Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x55692470c180 with id 9
=================================== 192.168.8.131 DeviceBootloader::bootMemory DONE
[    987722] [Scheduler01Thr] dispatcherCloseLink:519   DBG fullClose=1, link 0x5569246d9c40 with id 7

[    993883] [rgb_preview] getNextAvailableLink:615 DBG got link id 10 (nextUniqueLinkId = 11) saved on availableXLinks[0] = 0x5569246c09a0
[    993883] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.130 >>> got linkId 10

===== 192.168.8.130: device started
[    995699] [rgb_preview] getNextAvailableLink:615 DBG got link id 11 (nextUniqueLinkId = 12) saved on availableXLinks[1] = 0x5569246d9c40
[    995699] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.133 >>> got linkId 11

===== 192.168.8.133: device started
[    996161] [rgb_preview] getNextAvailableLink:615 DBG got link id 12 (nextUniqueLinkId = 13) saved on availableXLinks[2] = 0x5569246f2ee0
[    996161] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.131 >>> got linkId 12

===== 192.168.8.131: device started
[    996215] [rgb_preview] getNextAvailableLink:615 DBG got link id 13 (nextUniqueLinkId = 14) saved on availableXLinks[3] = 0x55692470c180
[    996215] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.132 >>> got linkId 13

[    996215] [rgb_preview] getNextAvailableLink:615 DBG got link id 14 (nextUniqueLinkId = 15) saved on availableXLinks[4] = 0x556924725420
[    996215] [rgb_preview] XLinkConnect:238 ======= DBG device name 192.168.8.129 >>> got linkId 14

===== 192.168.8.132: device started
===== 192.168.8.129: device started