robotpy / pynetworktables

Pure python implementation of the FRC NetworkTables protocol
Other
60 stars 30 forks source link

reconnect errors #63

Open virtuald opened 6 years ago

virtuald commented 6 years ago

Some kind of queue buildup or race condition during reconnects...

Reference: https://www.chiefdelphi.com/forums/showthread.php?t=164590

auscompgeek commented 6 years ago

I wonder if this is an ntcore bug as well that we've inherited. Both Shuffleboard and pynetworktables2js proved to be not entirely reliable at competitions for my team this year.

(I've also seen other teams have to restart SmartDashboard whilst on the field, so definitely not just my team.)

virtuald commented 6 years ago

@PeterJohnson thoughts on that possibility?

andrewda commented 6 years ago

As I mentioned in that CD thread, waiting until we could ping the roboRIO has a much higher success rate for us. It went from about a 30% chance of working when we booted the roboRIO and vision code simultaneously, but adding the delayed start spiked that up to something around 80% from all my tests. Still not perfect and there's still some funkiness going on, but still a heck of a lot better. I'd suggest that as a temporary workaround until a fix for this is discovered.

andrewda commented 6 years ago

Seems like this fix stopped working today. We're getting connection to networktables (logs show that we get all the initial data on the coproccessor) but NetworkTables.isConnected is still false and we're unable to add new values. We have connection to the roboRIO and port 1735 on the roboRIO.

virtuald commented 6 years ago

I haven't taken the time to dig into this yet... unfortunately I imagine it has to difficult to reproduce (seeing as I haven't had the issue).

Have you upgraded to pynetworktables 2018.1.1 yet? That addresses some unicode handling issues that could cause a connection to fail.

virtuald commented 6 years ago

@andrewda do you use networktables flush in your code anywhere? https://github.com/wpilibsuite/ntcore/issues/275 sounds vaguely related.

andrewda commented 6 years ago

Apologies for not replying to your message from a week ago: I did update to 2018.1.1 at CMP and it didn't seem to make a difference.

A while ago, in an attempt to find a fix for this problem, I did try adding a flush immediately after attempting to initialize the connection, i.e. something like:

NetworkTables.initialize(server="10.25.21.2")
NetworkTables.flush()

I never removed this since my tests on 2018.0.1 (it didn't seem to make a difference at the time), so once I get access to a robot again I can try removing this flush or attempting to call it regularly as wpilibsuite/ntcore#275 suggests.

virtuald commented 6 years ago

Actually, that would be a good thing to try -- calling flush continuously.

If you're not already calling flush more than once, I wouldn't expect that bug to affect you. If calling flush a lot of times fixes it for you, that would be very interesting to know and would narrow down the potential culprits.