FD sharing protocol on localhost

TheMutta commented 3 weeks ago

This update creates a new interface to stream over files and a new protocol that allows for hosts where both the Server and the Client instances of XLink are running on the same device to simply send over the FD and its ownership instead of just streaming the file data over. This allows for sharing of memory buffers without the need for copying, i.e. zerocopy.

An automatic discovery for the protocol is setup for TCP/IP.

It is to be noted that the packet size for XLink has been altered by adding a new FD attribute, meaning that older versions of XLink won't be able to parse the packets correctly.

TheMutta commented 3 weeks ago

Instead of connect/disconnect/reconnect, lets spin up both TCP & SHDMEM, by the following mechanism (C++):

Create two threads and return the fd's for the server sockets back to main context

cond_var to which completes sooner

close the other that did not complete yet

join both threads

continue & return which protocol

Ok that sounds doable, but i have some thoughts. If we're localhost and the TCP/IP connection resolves faster than the SHDMEM connection, which could definitely be possible, since the TCP/IP thread could be the first to be executed and therefore the first to complete the task, then we'd lose all the benefits of zerocopy. Apart from that, we also risk that on the server side the TCP/IP thread completes first, while on the client side the SHDMEM could complete first, which is not to be excluded on a multicore system. And if we're single core, the first one to connect would absolutely be arbitrary. And that would be detremental.

For this dual option, we might create a separate protocol entry to signify going down this path. (so TCP_IP would just be TCP, SHDMEM just shdmem and TCP_IP_OR_LOCAL_SHDMEM would be the grouped one)

For the connecting client, it would be similar, if TCP_IP_OR_LOCAL_SHDMEM is picked, then following occurs:

First try connecting via local shdmem if fails, go TCP/IP route So no need for the threading on the client side

I'd prefer a similar approach to this, and like the one i put in, which would only require some minor changes to depthai and would avoid all the shenaningans that multithreading adds to a multicore system. Maybe adding a timeout for SHDMEM connection of, for example, 5 seconds, would be perfect, since also the SHDMEM connection can happen only after TCP/IP has been connected, meaning we have both a server and a client already in place.

themarpe commented 3 weeks ago

If we're localhost and the TCP/IP connection resolves faster than the SHDMEM connection, which could definitely be possible, since the TCP/IP thread could be the first to be executed and therefore the first to complete the task, then we'd lose all the benefits of zerocopy. Apart from that, we also risk that on the server side the TCP/IP thread completes first, while on the client side the SHDMEM could complete first, which is not to be excluded on a multicore system. And if we're single core, the first one to connect would absolutely be arbitrary. And that would be detremental.

So, WRT the server, it always waits for both - the the timing doesn't matter, just the sequence. The client will try SHDMEM first, and if that succeeds, then the server will wrap up and not wait for TCPIP.

I'd prefer a similar approach to this, and like the one i put in, which would only require some minor changes to depthai and would avoid all the shenaningans that multithreading adds to a multicore system. Maybe adding a timeout for SHDMEM connection of, for example, 5 seconds, would be perfect, since also the SHDMEM connection can happen only after TCP/IP has been connected, meaning we have both a server and a client already in place.

Above case does cover that - only one protocol at once. Also, SHDMEM protocol fully supports all read/write ops right? (there is no need to have the TCP alongside it, correct)

TheMutta commented 3 weeks ago

Also, SHDMEM protocol fully supports all read/write ops right? (there is no need to have the TCP alongside it, correct)

Yes exactly, it's a completely indipendent protocol, and read/writes work perfectly. The only use for TCP/IP is to, well, determine if we're connecting to localhost and as a backup if SHDMEM connection doesn't work, meaning the connection is immediate if we're not connecting to localhost, and just takes the minimum time to open a socket if we want to connect through SHDMEM.

So, WRT the server, it always waits for both - the the timing doesn't matter, just the sequence. The client will try SHDMEM first, and if that succeeds, then the server will wrap up and not wait for TCPIP.

Ok sure, that seems doable. It's a bit more complicated than the current system, and would function similarly. I still think the present system may be less of a hassle, with the multithreaded example we'll have to think about cleanup of open sockets that have not been connected to. In any case, perfectly doable.

TheMutta commented 3 weeks ago

So this is what i came up with:

typedef struct {
    bool lock;
    const char **devPathRead;
    const char **devPathWrite;
    XLinkProtocol_t **protocol;
    void **fd;
} args_t;

static void *tcpipServer(args_t *args) {
    if(tcpipPlatformServer(*(args->protocol), *(args->devPathRead), *(args->devPathWrite), args->fd) == X_LINK_SUCCESS) {
    args->lock = true;
    }

    return NULL;
}

static void *shdmemServer(args_t *args) {
    if(shdmemPlatformServer(SHDMEM_DEFAULT_SOCKET, SHDMEM_DEFAULT_SOCKET, args->fd) == X_LINK_SUCCESS) {
    args->lock = true;
    shdmemSetProtocol(*(args->protocol), *(args->devPathRead), *(args->devPathWrite));

    }
    return NULL;
}

xLinkPlatformErrorCode_t XLinkPlatformServer(const char* devPathRead, const char* devPathWrite, XLinkProtocol_t *protocol, void** fd)
{
    switch (*protocol) {
        case X_LINK_TCP_IP: {
        pthread_t shdmemThread;
        pthread_t tcpipThread;

        args_t args;
        args.lock = false;
        args.devPathRead = &devPathRead;
        args.devPathWrite = &devPathWrite;
        args.protocol = &protocol;
        args.fd = fd;

        pthread_create(&shdmemThread, NULL, shdmemServer, (void*)&args);
        pthread_create(&tcpipThread, NULL, tcpipServer, (void*)&args);

        while(!*((volatile bool*)&args.lock)) {
        }

        if (*protocol == X_LINK_TCP_IP) {
        pthread_join(tcpipThread, NULL);
        pthread_cancel(shdmemThread);

            return X_LINK_SUCCESS;
        } else if (*protocol == X_LINK_LOCAL_SHDMEM) {
            pthread_join(shdmemThread, NULL);
        pthread_cancel(tcpipThread);

        return X_LINK_SUCCESS;
        } else {
        return X_LINK_PLATFORM_ERROR;
        }

        }
        ....

The only issue would be cleaning up the remaining open sockets. This can only be done with cleanup handlers inside the tcpipPlatformServer and the shdmemPlatformServer functions. The fact is, being a mixed C and C++ codebase, this absolutely makes the pthreads function throw out bogus errors like error: expected ‘while’ before ‘listen’, which i couldn't believe at first. Turns out that pthreads are bugged in a mixed C/C++ context and, therefore, the cleanup functions can't be used.

Therefore, our multithreading approach seems more trouble than it's worth.

TheMutta commented 3 weeks ago

Don't cancel the thread, but "close" the fd and just join both threads as is. Also, might be simpler to do this in C++ context due to thread, mutex & conditional_variable, etc...

I think i misunderstood, you mean that we'd spin up a separate thread waiting for the SHDMEM connection and, if successful, we just close the TCP/IP connection, create a new platform FD and switch to that protocol?

Otherwise if really PITA, we can leave as is - but the "sleep" should atleast be "magic'd out" & add to comments as "future impl." posiblity, that we can go to spinning both protocols up in parallel

It probably is more of a PITA that it's worth, at least imo. For the sleep, i think we could hide it in the shdmem protocol, with multiple retries delayed by a fixed amounts of ms to wait in between in connect().

Also, this is Posix only I assume, modify CMakeLists.txt to only activate if Posix & be optionally disable-able as well & add to CI & tests (which will make sure that we don't compile it in for Windows as well, etc...)

Yes, i guess a simple preprocessor directive like #ifdef(__unix__)) will do the trick, since the code doesn't work in any other platform anyway.

TheMutta commented 3 weeks ago

Don't cancel the thread, but "close" the fd and just join both threads as is. Also, might be simpler to do this in C++ context due to thread, mutex & conditional_variable, etc...

So, that has been completed. It was indeed far easier to do in C++, i just had to add some minor modification to make sure the socket is closed properly and the unused thread is detached.

Does it look good?

themarpe commented 3 weeks ago

Also, WRT the crosscompilation - any "unix" only things, especially if whole files, can be added either to:

CMake - condiitioanlly compiled
if def'd in/out

To fix the CI

TheMutta commented 3 weeks ago

Hmm - do we need a special type of an event? This will likely not work due to it being shared with (fixed) Bootloader on RVC2 (eg communicating with it will cause issues).

So lets try to either:

Remove the need for custom xLinkEventType entry reorder them to the end & fix assert (eg, to check between two ranges)

Hmm, I understand the issue. I think option 2 is the most convinient. That said, will the fact that the streamPacketDesc_t structure has been expanded to add the fd field cause issues with the RVC2 bootloader?

TheMutta commented 3 weeks ago

I've moved the requests in a way that the first few will remain constant and the new requests can be added without causing any issues for the new ones.

The detection protocol has been fixed with threads, and the conditional compilation has been put in place.

The remaining fail CI's seem to be because of the _findfirst64i32 or the ftruncate function redifiniton, none of which are in the source code.

themarpe commented 3 weeks ago

That said, will the fact that the streamPacketDesc_t structure has been expanded to add the fd field cause issues with the RVC2 bootloader?

No, no issues - just the xLinkEventHeader_t structure and all the downstream types (enums, etc...)

The remaining fail CI's seem to be because of the _findfirst64i32 or the ftruncate function redifiniton, none of which are in the source code.

Might be a bug in the mingw unix "translation" helpers, though I wasn't able to repro on my machine. Lets leave as is, we don't support mingw officially anyway

TheMutta commented 3 weeks ago

Merging the new shared memory zerocopy protocol with the mainline branch

luxonis / XLink

FD sharing protocol on localhost #82