Closed chauser closed 3 years ago
Have you tried one of the simple examples to see if they have the same behavior? For example, see https://github.com/robotpy/pynetworktables/blob/main/samples/nt_driverstation.py
Definitely would be happy to accept a PR with a fix. Unfortunately, I still don't actively use windows so this is pretty difficult to diagnose for me.
Setting the timeout to 2x would probably be the least risky change. Try it and let us know if that solves it?
Yes I have done the doubled timeout test and it works. I have also called the NetworkTables.initialize(server="127.0.0.1")
from interactive python and that works. But the doubled timeout is easy and will help enormously for Windows users.
closed by #117
Background: I'm attempting to learn about frc-characterization using a Romi robot; the robot code runs in desktop debugging mode so the related networktables server is also running on the desktop. The frc-characterization logging component is python code that uses pynetworktables. When the team number in the logging tool is set to 0, the tool does
NetworkTables.initialize(server="localhost")
to connect to the server. When running on my Windows 10 machines this never succeeds.Here's what's happening.
NetworkTables.initialize(server="localhost")
eventually turns into a call to python'ssocket.create_connection(("localhost", 1735), timeout=self.timeout)
intcp_connector.py
.create_connection
tries both the IPv6 and IPv4 resolutions of "localhost" in that order on Windows 10 if IPv6 is enabled. Of course there is no server running on IPv6 so that connection is refused. For reasons I don't understand, instead of immediately reporting a "connection refused" failure to the application, Windows chooses to wait for the timeout period to expire and then reports a "timeout" failure. It then proceeds to try the IPv4 address and that succeeds. However, the created connection is never seen by the pynetworktables code. Why?Well, two things: first,
TcpConnector.connect
is called not with("localhost", 1735)
which would use the straightforward, in-thread, code for a single server, but rather with[("localhost",1735)]
which uses the more complex multi-threaded code that tries to connect in parallel to multiple servers, accepting the result of whichever one finishes first. Second, if none of the threads succeed in the timeout period as monitored by a call toself.cond.wait(self.timeout)
in the parent thread, this code returnsNone
because none of the child threads has stored anything different.The problem, thus, is that the child thread that calls
create_connection
succeeds (when it tries to connect using IPv4 after the IPv6 connection times out), but only after the parent thread has timed out and moved on.I can see a number of possible fixes: simply setting the
cond.wait
timeout in the parent thread to2*self.timeout
will probably prevent it from happening. Calling connect with with("localhost", 1735)
rather than[("localhost",1735)]
would work. Calling with"127.0.0.1"
rather than"localhost"
would work.I haven't tried it on Linux of MacOS but I suspect that it would not be a problem there, even if they try IPv6 first, given that they immediately report
ECONNREFUSED
rather than waiting for a timeout.Finally, three more points:
tcp_connector.py
is also failure-prone when passed a multi-item list asserver_or_servers
. If I understand the code correctly, the first child thread to completenotify
sself.cond
. If that thread failed to create a connection, for example, receivingECONNREFUSED
then the result from any later-completing thread that succeeds will never be used.socket.create_connection
that I've described can be verified in an interactive python session while simultaneously running Wireshark.