robotpy / pynetworktables

Pure python implementation of the FRC NetworkTables protocol
Other
60 stars 30 forks source link

pynetworktables client can't connect to networktables server on Windows 10 "localhost" #116

Closed chauser closed 3 years ago

chauser commented 3 years ago

Background: I'm attempting to learn about frc-characterization using a Romi robot; the robot code runs in desktop debugging mode so the related networktables server is also running on the desktop. The frc-characterization logging component is python code that uses pynetworktables. When the team number in the logging tool is set to 0, the tool does NetworkTables.initialize(server="localhost") to connect to the server. When running on my Windows 10 machines this never succeeds.

Here's what's happening. NetworkTables.initialize(server="localhost") eventually turns into a call to python's socket.create_connection(("localhost", 1735), timeout=self.timeout) in tcp_connector.py. create_connection tries both the IPv6 and IPv4 resolutions of "localhost" in that order on Windows 10 if IPv6 is enabled. Of course there is no server running on IPv6 so that connection is refused. For reasons I don't understand, instead of immediately reporting a "connection refused" failure to the application, Windows chooses to wait for the timeout period to expire and then reports a "timeout" failure. It then proceeds to try the IPv4 address and that succeeds. However, the created connection is never seen by the pynetworktables code. Why?

Well, two things: first, TcpConnector.connect is called not with ("localhost", 1735) which would use the straightforward, in-thread, code for a single server, but rather with [("localhost",1735)] which uses the more complex multi-threaded code that tries to connect in parallel to multiple servers, accepting the result of whichever one finishes first. Second, if none of the threads succeed in the timeout period as monitored by a call to self.cond.wait(self.timeout) in the parent thread, this code returns None because none of the child threads has stored anything different.

The problem, thus, is that the child thread that calls create_connection succeeds (when it tries to connect using IPv4 after the IPv6 connection times out), but only after the parent thread has timed out and moved on.

I can see a number of possible fixes: simply setting the cond.wait timeout in the parent thread to 2*self.timeout will probably prevent it from happening. Calling connect with with ("localhost", 1735) rather than [("localhost",1735)] would work. Calling with "127.0.0.1" rather than "localhost" would work.

I haven't tried it on Linux of MacOS but I suspect that it would not be a problem there, even if they try IPv6 first, given that they immediately report ECONNREFUSED rather than waiting for a timeout.

Finally, three more points:

virtuald commented 3 years ago

Have you tried one of the simple examples to see if they have the same behavior? For example, see https://github.com/robotpy/pynetworktables/blob/main/samples/nt_driverstation.py

Definitely would be happy to accept a PR with a fix. Unfortunately, I still don't actively use windows so this is pretty difficult to diagnose for me.

virtuald commented 3 years ago

Setting the timeout to 2x would probably be the least risky change. Try it and let us know if that solves it?

chauser commented 3 years ago

Yes I have done the doubled timeout test and it works. I have also called the NetworkTables.initialize(server="127.0.0.1") from interactive python and that works. But the doubled timeout is easy and will help enormously for Windows users.

TheTripleV commented 3 years ago

closed by #117