sccn / labstreaminglayer

LabStreamingLayer super repository comprising submodules for LSL and associated apps.
Other
522 stars 157 forks source link

"Error trying to send a time packet: Invalid argument" when accessing a new stream #105

Closed mgrivich closed 1 year ago

mgrivich commented 1 year ago

I use the standard example SendStringMarkers. I use a modified version of ReceiveStringMarkers:

#include <iostream>
#include <lsl_cpp.h>
#include <memory>
#include <string>

int main(int, char**) {
    lsl::continuous_resolver resolver(60);
    std::string active_name = "";
    std::unique_ptr<lsl::stream_inlet> inlet;

    std::string sample;

    bool loaded = false;

    while(true) {
        std::vector<lsl::stream_info> streams = resolver.results();

        if(streams.size() > 0) {
            std::string current_name = streams[0].name() + ";" + streams[0].hostname() + ";" + streams[0].uid();

            if(active_name.compare(current_name) != 0) {
                inlet.reset(new lsl::stream_inlet(streams[0]));
                active_name = current_name;
                std::cout << "loade: " << current_name << std::endl;
                loaded = true;
            }
        }

        if(loaded) {
            double remote_time = 0.0;
            double uncertainty = 0.0;
            double new_correction = inlet->time_correction(&remote_time, &uncertainty);
            if(double ts = inlet->pull_sample(&sample, 1, 0.2)) {
                std::cout << sample << '\t' << new_correction << std::endl;
            }

        }
    }
    return 0;
}

What this does, is anytime stream 0 changes its identification, the receiver quickly reloads it. If I quickly exit and restart SendStringMarkers (while leaving the receiver open), the receiver will fail a significant fraction of the time with:

2022-12-07 15:16:03.007 (  10.023s) [R_MyEventStrea  ]      data_receiver.cpp:350    ERR| Stream transmission broke off (Input stream error.); re-connecting...
2022-12-07 15:16:05.025 (  12.041s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:05.276 (  12.292s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:05.514 (  12.530s) [R_MyEventStrea  ]      data_receiver.cpp:350    ERR| Stream transmission broke off (Invalid argument); re-connecting...
2022-12-07 15:16:05.526 (  12.543s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:05.777 (  12.793s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:06.028 (  13.044s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:06.278 (  13.294s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:06.529 (  13.545s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:06.779 (  13.796s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:07.017 (  14.033s) [R_MyEventStrea  ]      data_receiver.cpp:350    ERR| Stream transmission broke off (Invalid argument); re-connecting...
2022-12-07 15:16:07.030 (  14.046s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:07.281 (  14.297s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:07.531 (  14.548s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:07.782 (  14.798s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:08.032 (  15.049s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:08.283 (  15.299s) [T_MyEventStream ]      time_receiver.cpp:136   WARN| Error trying to send a time packet: Invalid argument
2022-12-07 15:16:08.519 (  15.536s) [R_MyEventStrea  ]      data_receiver.cpp:350    ERR| Stream transmission broke off (Invalid argument); re-connecting...

forever

I inspected timereceiver.cpp and found that conn.get_udp_endpoint().address is empty in time_reciever::send_nextpacket(), which causes the error. It looks like conn can be in an uninitialized and unhandled state when inlet->time_correction and/or inlet->pull sample is called.

I'm using LSL 1.16, and have seen this this on both linux and windows. I've seen it both within one computer and between two computers. Release build, 64 bit, optimization on (-02 on linux and /O2 on windows). This looks like a race condition bug (that is, it depends on which threads get processed first). So it may not occur with every compiler, compiler setting, and system.

This issue did not occur with LSL 1.13.

tstenner commented 1 year ago

I've been trying to reproduce this, but after killing and automatically restarting SendStringMarkers for half an hour I didn't encounter the error even once (both debug and release builds).

If I had to guess, there's something weird about the locking in the recovery process. Maybe https://github.com/tstenner/liblsl/commit/c0ea534e16806416215878f880a8c004f05a9d25 fixes the problem.

mgrivich commented 1 year ago

Your commit seems to have fixed the time_receiver.cpp bug. At least it suppressed the warning message. However, note there is also the same kind of error at datareceiver.cpp:342, which is thrown by conn.get_tcpendpoint() at line 154. conn.address() is empty again. Could you apply the same treatment to that file?

Thanks.

tstenner commented 1 year ago

The data receiver is notified of the new connection differently, so I'll have to see how what kind of locking is / isn't needed there. I can do it, but I'm kind of swamped at the moment and it might be a week or two until I get around to it.

mgrivich commented 1 year ago

Could I get a status update on this? This is preventing me from updating our apps to latest.

tstenner commented 1 year ago

I tried to reproduce it on a few free evenings, but haven't been able to reproduce it so far. It doesn't really help much that the only Windows PC I can install things on is the one I use for my taxes...

mgrivich commented 1 year ago

I was able to fix the remaining issue. Your changes, above, are still required. What was happening is that sometimes the source stream would re-appear as ipv6 instead of ipv4. The code below checks for and fixes the issue. I'd like this to be merged in the main branch. Let me know how you'd like me to go about that, or if you'd like to handle it. In inlet_connection.cpp, starting from line 204:

if (infos.size() == 1) {

    if(infos[0].v4address().empty() && infos[0].v6address().empty()) {
        continue;
    }

    // update the endpoint
    host_info_ = infos[0];

    // Source may have jumped ipv4 -> ipv6 or vice versa.
    // re-select TCP/UDP protocol versions
    if (api_config::get_instance()->allow_ipv6()) {
        // if IPv6 is optionally allowed...
        if (host_info_.v4address().empty() || !host_info_.v4data_port() ||
            !host_info_.v4service_port()) {
            // then use it but only iff there are problems with the IPv4 connection data
            tcp_protocol_ = tcp::v6();
            udp_protocol_ = udp::v6();
        } else {
            // (otherwise stick to IPv4)
            tcp_protocol_ = tcp::v4();
            udp_protocol_ = udp::v4();
        }
    } else {
        // otherwise use the protocol type that is selected in the config
        tcp_protocol_ = api_config::get_instance()->allow_ipv4() ? tcp::v4() : tcp::v6();
        udp_protocol_ = api_config::get_instance()->allow_ipv4() ? udp::v4() : udp::v6();
    }                                               

    // cancel all cancellable operations registered with this connection
    cancel_all_registered();
    .
    .
    .
}
tstenner commented 1 year ago

Ah, good to know the issue wasn't a mutex problem I missed after looking for it a few times.

I added a small helper method so the protocol selection can be used in both the constructor and the recovery path (https://github.com/sccn/liblsl/pull/195).

mgrivich commented 1 year ago

Thanks. Looks good now.