Closed mgrivich closed 1 year ago
I've been trying to reproduce this, but after killing and automatically restarting SendStringMarkers
for half an hour I didn't encounter the error even once (both debug and release builds).
If I had to guess, there's something weird about the locking in the recovery process. Maybe https://github.com/tstenner/liblsl/commit/c0ea534e16806416215878f880a8c004f05a9d25 fixes the problem.
Your commit seems to have fixed the time_receiver.cpp bug. At least it suppressed the warning message. However, note there is also the same kind of error at datareceiver.cpp:342, which is thrown by conn.get_tcpendpoint() at line 154. conn.address() is empty again. Could you apply the same treatment to that file?
Thanks.
The data receiver is notified of the new connection differently, so I'll have to see how what kind of locking is / isn't needed there. I can do it, but I'm kind of swamped at the moment and it might be a week or two until I get around to it.
Could I get a status update on this? This is preventing me from updating our apps to latest.
I tried to reproduce it on a few free evenings, but haven't been able to reproduce it so far. It doesn't really help much that the only Windows PC I can install things on is the one I use for my taxes...
I was able to fix the remaining issue. Your changes, above, are still required. What was happening is that sometimes the source stream would re-appear as ipv6 instead of ipv4. The code below checks for and fixes the issue. I'd like this to be merged in the main branch. Let me know how you'd like me to go about that, or if you'd like to handle it. In inlet_connection.cpp, starting from line 204:
if (infos.size() == 1) {
if(infos[0].v4address().empty() && infos[0].v6address().empty()) {
continue;
}
// update the endpoint
host_info_ = infos[0];
// Source may have jumped ipv4 -> ipv6 or vice versa.
// re-select TCP/UDP protocol versions
if (api_config::get_instance()->allow_ipv6()) {
// if IPv6 is optionally allowed...
if (host_info_.v4address().empty() || !host_info_.v4data_port() ||
!host_info_.v4service_port()) {
// then use it but only iff there are problems with the IPv4 connection data
tcp_protocol_ = tcp::v6();
udp_protocol_ = udp::v6();
} else {
// (otherwise stick to IPv4)
tcp_protocol_ = tcp::v4();
udp_protocol_ = udp::v4();
}
} else {
// otherwise use the protocol type that is selected in the config
tcp_protocol_ = api_config::get_instance()->allow_ipv4() ? tcp::v4() : tcp::v6();
udp_protocol_ = api_config::get_instance()->allow_ipv4() ? udp::v4() : udp::v6();
}
// cancel all cancellable operations registered with this connection
cancel_all_registered();
.
.
.
}
Ah, good to know the issue wasn't a mutex problem I missed after looking for it a few times.
I added a small helper method so the protocol selection can be used in both the constructor and the recovery path (https://github.com/sccn/liblsl/pull/195).
Thanks. Looks good now.
I use the standard example SendStringMarkers. I use a modified version of ReceiveStringMarkers:
What this does, is anytime stream 0 changes its identification, the receiver quickly reloads it. If I quickly exit and restart SendStringMarkers (while leaving the receiver open), the receiver will fail a significant fraction of the time with:
I inspected timereceiver.cpp and found that conn.get_udp_endpoint().address is empty in time_reciever::send_nextpacket(), which causes the error. It looks like conn can be in an uninitialized and unhandled state when inlet->time_correction and/or inlet->pull sample is called.
I'm using LSL 1.16, and have seen this this on both linux and windows. I've seen it both within one computer and between two computers. Release build, 64 bit, optimization on (-02 on linux and /O2 on windows). This looks like a race condition bug (that is, it depends on which threads get processed first). So it may not occur with every compiler, compiler setting, and system.
This issue did not occur with LSL 1.13.