Sometimes not connecting over FMS

cory2067 commented 8 years ago

In many matches, we've seen our vision system and other important variables drop out, and I now think I've figured out that this is due to pynetworktables (NT v3.0) not working. Indeed, during one match, I could see that isConnected() was continually reading false. The interesting thing is that in some matches it works and some matches it doesn't. When not connecting via FMS, it always works.

Since the LabVIEW dashboard was successfully showing network tables variables, I can conclude that this problem is one specific to pynetworktables.

virtuald commented 8 years ago

What hostname were you using to connect to your robot?

I had noticed a similar problem with v2.0 at our last competition. We were using mDNS, so I was thinking of switching to static IPs to see if that resolves the issue.

computer-whisperer commented 8 years ago

I also encountered this. I was using a pynetworktables2js server on the driver station, and found that I had to re-start the http server after FMS connection in order for it to work.

cory2067 commented 8 years ago

I was using mDNS.

I'll definitely try restarting my script after FMS connection in future events.

virtuald commented 8 years ago

Hm, I wonder what we're doing differently in python. I did notice in the pits that resolving mDNS names using pyfrc (for deployment) would take an excessively long period of time... wonder if there's an odd DNS issue in python. @cory2067 and @computer-whisperer: what version of python?

cory2067 commented 8 years ago

At least 3.4, not sure the exact number. Will need to check later this week.

ArchdukeTim commented 8 years ago

We had a similar issue at our match. It seemed that we had to start the server, stop it, then start it again for it to work. And sometimes we didn't connect at all, and the only way to get it working was rebooting robot (I think we discovered if you start the robot before connecting the laptop to the FMS it didn't work)

computer-whisperer commented 8 years ago

Windows 7 Python 3.4.3 with DHCP+mDNS networking.

Kevin-OConnor commented 8 years ago

The LabVIEW Dashboard acquires the IP to connect to from a TCP connection with the Driver Station, which may have located the robot via DNS or static IP fallback (if your roboRIO is set to 10.TE.AM.2). This is one strong possibility as to why it was working in cases where your pyNetworkTables client using mDNS was not.

virtuald commented 8 years ago

Oh reallllly.... tell me more about this TCP connection with the Driver Station...

PeterJohnson commented 8 years ago

Dustin--take a look at the Dashboard LabView code. It listens to TCP port 1741. One of the pieces of data that is sent is the robot IP address. Unfortunately there can only be one TCP client on this port, so if the LabView dashboard is running you can't put your own server on it.

Kevin-OConnor commented 8 years ago

I don't think we have the protocol between the two documented but you can see it in the LabVIEW dashboard template (connection is made in bottom right, Loop 7, data is processed in WPI_DashboardRetrieveStatus.vi, TCP is specifically in WPI_DashboardProcessTCPPacket.vi, in Loop 1). The Dashboard connects to the DS (on localhost, port 1741). The data is composed of tagged values which are each: U16 Tag length (length of tagged data following these 2 bytes) U8 Tag Tag Length-1 bytes of tagged data

The tag # for Robot IP is 8 The IP is sent as a U32 which is LabVIEWs encoding of the IP (e.g. 172.22.11.2 = 0x AC 16 0B 02 = 2887125762)

virtuald commented 8 years ago

@Kevin-OConnor thanks for the information, I got it working on a VM and a roborio I have here. Our last district event is this weekend, so I'll try this out and see if that improves our situation.

@PeterJohnson does ntcore have support for this already then?

virtuald commented 8 years ago

Fixed and released in 2015.3.2, and in 2016.0.0a2. Also added support to pynetworktables2js in 2015.2.2.

@cory2067 I don't have a LabVIEW setup, so I have not tested 2016.0.0a2. However, as Peter mentioned, since the DS is connecting to a server, this fix won't allow you to run the LabVIEW dashboard and pynetworktables concurrently on the same machine. If that is a requirement, you'll want to switch to a static IP setup.

PeterJohnson commented 8 years ago

No, ntcore does not have support for this protocol, and I don't think it should (it's really not the right place for it; it expects the "user" program like the dashboard application to provide an appropriate IP to connect to). However, I'm thinking we should also add support for this protocol to SmartDashboard to help alleviate the issues teams are seeing with it sometimes not connecting. It should be straightforward to do so, but unfortunately I don't have much time this week to work on it.

virtuald commented 8 years ago

Does ntcore allow changing the IP address after initialization? I know one of the limitations of the old protocol was that it can only be set before initialization, which is why I included it here.

It does seem like the type of feature that could benefit from deep integration with ntcore (maybe via some interface to keep it decoupled).

PeterJohnson commented 8 years ago

You have to call StopClient() before you call StartClient() again, but yes, you can change the server address in ntcore at runtime. If it wasn't possible before, that was an implementation (or API) constraint, not a protocol constraint.

As ntcore runs on both the robot, DS, and coprocessors, and possibly multiple instances of it might be running simultaneously on the DS and/or coprocessor (e.g. SmartDashboard plus GRIP plus a user program), the approach of getting the IP address would need to change before I would consider integrating it. In future years, we could theoretically solve the narrow problem of multiple DS-local apps by doing a doorbell-type server in the DS app and connecting to localhost (e.g. reversing the current dashboard TCP connection), but that doesn't solve the coprocessor problem. If the robot IP address is not fixed, some discovery method is required. While mDNS has issues, adding a doorbell protocol to the DS app doesn't seem like the best solution, as after all the DS still needs to find the robot! If we had more consistent DNS that would likely be the best possible method. I think the main thing I'd need to add to ntcore in that case would be to take a list of client addresses to iterate over rather than just one, so it would e.g. try "roborio-xx.lan", then "roborio-xx.local", etc.

robotpy / pynetworktables

Sometimes not connecting over FMS #25