UnicodeDecodeError trying to connect to Network Tables on roboRIO (Java) from raspberry pi with Microsoft Lifecam

jousley commented 7 years ago

UnicodeDecodeError trying to connect to Network Tables on roboRIO (Java) from Raspberry Pi with Microsoft Lifecam plugged into roboRIO.

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> from networktables import NetworkTables as NT
>>> NT.initialize('10.35.28.32')
INFO:nt:NetworkTables 2017.0.4 initialized in client mode
>>> DEBUG:nt:client connected
ERROR:nt:Unhandled exception during handshake
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ntcore/network_connection.py", line 240, in _readThreadMain
    handshake_success = self.m_handshake(self, _getMessage, self._sendMessages)
  File "/usr/local/lib/python2.7/dist-packages/ntcore/dispatcher.py", line 488, in _clientHandshake
    msg = get_msg()
  File "/usr/local/lib/python2.7/dist-packages/ntcore/network_connection.py", line 228, in _getMessage
    return Message.read(self.m_stream, decoder, self.m_get_entry_type)
  File "/usr/local/lib/python2.7/dist-packages/ntcore/message.py", line 123, in read
    value = codec.read_value(value_type, rstream)
  File "/usr/local/lib/python2.7/dist-packages/ntcore/wire.py", line 126, in read_value
    return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
  File "/usr/local/lib/python2.7/dist-packages/ntcore/wire.py", line 198, in read_string_v3
    return rstream.read(slen).decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf0 in position 61: invalid continuation byte
INFO:nt:DISCONNECTED 10.35.28.32 port 1735 (Robot)

virtuald commented 7 years ago

I tried some simple things (mostly by adding unicode values to keys and strings) to reproduce your error and wasn't able to do so. If you can reproduce this, please try to figure out what values are in NetworkTables at the time -- maybe via a screenshot of OutlineViewer, or something... look for weird non-ascii values. Another thing you can do is edit /usr/local/lib/python2.7/dist-packages/ntcore/wire.py and catch that exception, and when it occurs then print out the string that's causing the issue.

virtuald commented 7 years ago

Any progress here?

TurtleEmperorx commented 7 years ago

I have come across this issue as well with no solution yet

virtuald commented 7 years ago

I'm not currently able to reproduce this bug. If you're able to reproduce it reliably and provide a way for me to do so, I can fix it. Otherwise, upgrade to Python 3 and I suspect the problem will disappear.

virtuald commented 7 years ago

Two more reports of this, both on python 3:

DEBUG:nt:client connected
DEBUG:nt:NetworkConnection stopping (<ntcore.network_connection.NetworkConnection object at 0x712411b0>)
ERROR:nt:Unhandled exception during handshake
Traceback (most recent call last):
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/network_connection.py", line 240, in _readThreadMain
    handshake_success = self.m_handshake(self, _getMessage, self._sendMessages)
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/dispatcher.py", line 488, in _clientHandshake
    msg = get_msg()
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/network_connection.py", line 228, in _getMessage
    return Message.read(self.m_stream, decoder, self.m_get_entry_type)
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/message.py", line 123, in read
    value = codec.read_value(value_type, rstream)
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 126, in read_value
    return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 126, in <listcomp>
    return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 198, in read_string_v3
    return rstream.read(slen).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 47: invalid continuation byte
INFO:nt:DISCONNECTED 10.0.66.2 port 1735 (Robot)
DEBUG:nt:write thread died (<ntcore.network_connection.NetworkConnection object at 0x70088430>)

16:21:39:014 ERROR   : nt                  : Unhandled exception during handshake
Traceback (most recent call last):
  File "/usr/local/var/pyenv/versions/dashboard/lib/python3.6/site-packages/ntcore/network_connection.py", line 240, in _readThreadMain
    handshake_success = self.m_handshake(self, _getMessage, self._sendMessages)
  File "/usr/local/var/pyenv/versions/dashboard/lib/python3.6/site-packages/ntcore/dispatcher.py", line 488, in _clientHandshake
    msg = get_msg()
  File "/usr/local/var/pyenv/versions/dashboard/lib/python3.6/site-packages/ntcore/network_connection.py", line 228, in _getMessage
    return Message.read(self.m_stream, decoder, self.m_get_entry_type)
  File "/usr/local/var/pyenv/versions/dashboard/lib/python3.6/site-packages/ntcore/message.py", line 123, in read
    value = codec.read_value(value_type, rstream)
  File "/usr/local/var/pyenv/versions/dashboard/lib/python3.6/site-packages/ntcore/wire.py", line 126, in read_value
    return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
  File "/usr/local/var/pyenv/versions/dashboard/lib/python3.6/site-packages/ntcore/wire.py", line 126, in <listcomp>
    return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
  File "/usr/local/var/pyenv/versions/dashboard/lib/python3.6/site-packages/ntcore/wire.py", line 198, in read_string_v3
    return rstream.read(slen).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 89: invalid start byte
16:21:39:014 INFO    : nt                  : DISCONNECTED 10.24.3.2 port 1735 (Robot)

There's definitely an issue, but I need more details and need to be able to reproduce this otherwise I can't help you fix this issue:

Does it happen every time, or just sometimes?
What robot code are you using (LabVIEW? Java? C++)
What dashboards are you using? LabVIEW? SmartDashboard? Custom?
Are you using CameraServer -- what cameras are attached to the robot?
I would really love for someone to connect with OutlineViewer and take a screenshot of all of the keys/values in the table -- particular weird non-ascii characters, to see if we can nail this down. Though, I'm starting to suspect this is a protocol incompatibility with LabVIEW -- but I don't have LabVIEW available to me

virtuald commented 7 years ago

Also, if you are able to reproduce this reliably, something that can help me diagnose this is adding the following code to the top of your main py file where logging is initialized:

https://gist.github.com/virtuald/65eed85ac579000eec14a40f41f47287

TurtleEmperorx commented 7 years ago

Sorry, but we just reverted the code to an earlier version, because we were in a hurry and lost the copy that produced the bug.

virtuald commented 7 years ago

So it was something in your code?

ArchdukeTim commented 7 years ago

I'm curious if this is random, or if something specific in code is causing it.

TurtleEmperorx commented 7 years ago

We don't know

ThePlasmaGuy commented 7 years ago

That second example above from an hour ago was us. We're using a custom dashboard running off pynetworktables2js, so it's not likely an issue with our code. And it's also not an instance issue since we were receiving the same issue on multiple computers (one on macOS and one on Windows 7). It seems to be something related to the FMS or the new router firmware, since everything worked fine before the router was flashed with competition firmware. The firmware flash also caused issues with networking with our pis, but that is likely unrelated.

Hope we can figure this out!!

ArchdukeTim commented 7 years ago

It's possible you're trying to add a weird character to nt, and it's not having it. Anything like that TPG?

virtuald commented 7 years ago

Hm, perhaps the packets are getting corrupted somehow by the 2017 router (though, then why isn't ntcore crashing.. maybe it's not trying to encode/decode the characters?). It would be useful to look at OutlineViewer and see if there is any gibberish in that output.

Do you have a 2016 router -- those are legal to use.

ArchdukeTim commented 7 years ago

@virtuald we should've had problems then too...Sounds like a really, really far edge case

ThePlasmaGuy commented 7 years ago

I've looked through our robot and pi code and don't see anything besides floats and alphabetical strings being sent over network tables. So unless smartdashboard (which we keep backcompatabilty with because our dashboard isn't working right now) is doing something weird, we aren't throwing random Unicode in our network tables.

ThePlasmaGuy commented 7 years ago

We're currently using the 2016 router with the competition firmware, although we got the same symptoms on the 2017 router.

virtuald commented 7 years ago

I just pushed a package to pypi -- 2017.0.7a1 ... it tells python to ignore the bad unicode characters when it sees them. I haven't tried it much, but that may fix the issue for now. I would like to know why the error is occurring though, so if this addresses it if you could take a screenshot of OutlineViewer or something of any weird characters that could be useful.

ArchdukeTim commented 7 years ago

could you make the update print out the bad string, or no?

ThePlasmaGuy commented 7 years ago

Wouldn't that cause display errors on Windows? I know the windows command prompt doesnt like displaying Unicode , at least in my experience...

virtuald commented 7 years ago

If this fixes it, then we can talk about creating ways to diagnose it further.

ThePlasmaGuy commented 7 years ago

Ok, I'll update and try again in the morning :+1:

Daltz333 commented 7 years ago

Same error on the updated PyNetworkTables.

02:54:39:940 DEBUG   : nt                  : NetworkConnection stopping (<ntcore.network_connection.NetworkConnection object at 0x7125e090>)
02:54:39:950 ERROR   : nt                  : Unhandled exception during handshake
Traceback (most recent call last):
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/network_connection.py", line 240, in _readThreadMain
    handshake_success = self.m_handshake(self, _getMessage, self._sendMessages)
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/dispatcher.py", line 488, in _clientHandshake
    msg = get_msg()
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/network_connection.py", line 228, in _getMessage
    return Message.read(self.m_stream, decoder, self.m_get_entry_type)
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/message.py", line 123, in read
    value = codec.read_value(value_type, rstream)
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 130, in read_value
    return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 130, in <listcomp>
    return Value.makeStringArray([self.read_string(rstream) for _ in range(alen)])
  File "/home/pi/.virtualenvs/cv/lib/python3.4/site-packages/ntcore/wire.py", line 206, in read_string_v3
    return rstream.read(slen).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 68: invalid continuation byte

Locals at innermost frame:

{ 'rstream': <ntcore.tcpsockets.tcp_stream.TCPStream object at 0x6eed2410>,
  'self': <ntcore.wire.WireCodec object at 0x7124ae90>,
  'slen': 117}

02:54:39:952 INFO    : nt                  : DISCONNECTED 10.0.66.2 port 1735 (Robot)
02:54:39:953 DEBUG   : nt                  : write thread died (<ntcore.network_connection.NetworkConnection object at 0x6eed24f0>)

Daltz333 commented 7 years ago

Here's some information on our setup. RoboRIO static IP set to 10.0.66.2, Bridge is configured for comp mode. Raspberry Pi is static IP set to 10.0.66.12. IP Camera is set to 10.0.66.11. We do have a second USB camera but that's only streaming to smart dashboard. The error happens regardless of SmartDash being opened or robot being enabled. It also happens to my second Pi with the same setup.

ArchdukeTim commented 7 years ago

Is it only happening to people using rPIs?

virtuald commented 7 years ago

@Daltz333 you aren't using the updated version -- read_string_v3 is at line 196: https://github.com/robotpy/pynetworktables/blob/unicode-fix/ntcore/wire.py#L196

You may want to do pip install -U pynetworktables --pre or pip install pynetworktables==2017.0.7a1

virtuald commented 7 years ago

@ThePlasmaGuy did it work for you?

Daltz333 commented 7 years ago

@virtuald Here's a stackexchange post I made (before the update). I will update tomorrow and get back to you. http://robotics.stackexchange.com/questions/11840/first-robotics-competition-pynetworktables-nt-thread-died?noredirect=1#comment21072_11840

ThePlasmaGuy commented 7 years ago

@virtuald We forgot to test at comp due to the craziness of the last day of competition. However, I did preserve the entire control system out of bag so I can test with the new pynetworktables before we flash the radio for practice.

virtuald commented 7 years ago

Any more progress here?

ThePlasmaGuy commented 7 years ago

We have yet to have time to set up the test bed due to the fact that our school is on break this week. I'm planning on setting up the test bed with the problematic radio, Rio, and pis when I can get access to the hardware on Monday and Tuesday, so I'll get back to you after that.

Daltz333 commented 7 years ago

I was able to somehow (I don't know how I did it) able to reproduce the error on my team's test robot. I updated it to the new pynetworktables and the error disappeared. But since I don't know how I reproduced it I can't guarantee that the update did indeed solve my problem.

virtuald commented 7 years ago

Heh. If someone is able to reproduce it, I have a branch locally with code that should be able to record the network stream.... I'm tired though, so I'll push it later.

virtuald commented 7 years ago

If you are able to reproduce this error locally, please install the branch at https://github.com/robotpy/pynetworktables/tree/handshake-debug ... basically, you have to:

git clone https://github.com/robotpy/pynetworktables
cd pynetworktables
git checkout handshake-debug
python setup.py sdist
copy dist/pynetworktables*.gz to your system
pip3 install -U pynetworktables*.gz

If you do it correctly, then when it crashes there should be a file called 'file.bin' in the directory you launched the code from. Send me an email with that file.

ThePlasmaGuy commented 7 years ago

Update: Things seem to be working right now. I'm running the most recent beta pynetworktables, but not even getting any logs so I don't know. However, I'm currently running off a little testbed setup without our raspberry pis, so idk if that's the thing causing the issue. I'll try setting that up and testing with that variable as well in the coming days, once we get our practice bot up and operational with the competition hardware.

denchief1 commented 7 years ago

Information I found at out competition is this error only occurs when the Rio is functiobing as the server. If the server is the driver station then it works fine and publishes data. We use a Jetson TX1. Also the Rio can not connect to another device as a client. Our robot is still flashed from competition so I will use the branch this weekend.

denchief1 commented 7 years ago

Another thing is we use python 2.7 and it was working at home just fine.

virtuald commented 7 years ago

Is this reproduceable in the pits with a field-configured radio, or only on the field? Maybe I can borrow a radio for a bit to try and reproduce this.

denchief1 commented 7 years ago

It is reproducible in the pit

ThePlasmaGuy commented 7 years ago

@denchief1 are you running the most recent version of pynetworktables? (The betas posted above) Is the TX1 the only device on the network? For us, the most recent beta seemed to fix things, although we didn't have our pis on the network at the time.

Are you getting the unicode error when trying to connect to the rio, or is it just not able to post? We found that our raspberry pis were unable to connect to the rio when they were running off a separate radio port from the rio, and they worked fine when we ran the rio and the pis off a switch off a single radio port.

denchief1 commented 7 years ago

We are running the betas. I no longer get the unicorn error however the jetson can still not connet. The patch just seems to suppress the error. Our network is the Rio running to the first radio and then a switch running from the radio. The Jeton is plugged into the switch.

ThePlasmaGuy commented 7 years ago

I would try running the switch off the primary port and plugging the roborio into another port on the switch. That's legal as of last week's game update and it's allowed things to work with our raspberry pis for us. We were still having custom dashboard issues, but the beta has seemed to fix that. The two router ports apparently use different protocols and that has caused issues with certain devices iirc.

If you're running the most recent beta that @virtuald posted a couple days ago and its actually still causing errors, then it should be generating a log file that you should send him.

denchief1 commented 7 years ago

Is that the branch? Or is it the pynetworktables pypi file?

denchief1 commented 7 years ago

I will try the plugging of the Rio into the switch during our unbag time. (We are in Michigan)

ThePlasmaGuy commented 7 years ago

The version with the log file is the one you have to build yourself off the branch:

If you are able to reproduce this error locally, please install the branch at https://github.com/robotpy/pynetworktables/tree/handshake-debug ... basically, you have to: git clone https://github.com/robotpy/pynetworktables cd pynetworktables git checkout handshake-debug python setup.py sdist copy dist/pynetworktables.gz to your system pip3 install -U pynetworktables.gz If you do it correctly, then when it crashes there should be a file called 'file.bin' in the directory you launched the code from. Send me an email with that file.

Yeah, We tested the switch during week 2 and it seemed to work fine. Our particular switch was giving us a few comms issues, but from talking to other teams who use that "roborio into switch" setup (The Highrollers (987) in particular), that's just an issue with our switch.

PeterJohnson commented 7 years ago

If you're running into any connectivity issues, try to switch everything to fixed IPs first (10.TE.AM.x with netmask 255.0.0.0). The robot should be .2, the DS .5, and everything else arbitrary above .5.

Everything has to work pretty much perfectly for mDNS to work.

Peter

denchief1 commented 7 years ago

We are running static ips. I will try the branch build this week.

ThePlasmaGuy commented 7 years ago

Connectivity issues are sometimes happening over the second radio port regardless of static IP addresses, etc simply because of the different protocols used on both ports.

virtuald commented 7 years ago

@denchief1 If you can reproduce it and get me that logfile, I would be very happy to see it.

andrewnabors commented 7 years ago

We are having this issue to at competition running on our TX1. It's been an on and off issue for us. I tried installing the beta as described above by @virtuald but I'm not sure it's working correctly. I get a permission denied when trying to install the old. I ran with sudo and got other warning but no permission errors.

We are running everything static and Python 2.7

ThePlasmaGuy commented 7 years ago

Update from yesterday:

We hooked the competition radio, rio, and both raspberry pis up to our practice bot so it was in the same state as it was at competition 2 weeks ago. While I wasn't getting the issue on my test bed with only the rio and radio hooked up, adding the raspberry pis caused the issue to return once more. Because the raspberry pi's were most definitely triggering the issue, we went into the CV code we were running on the pis and checked all of the network tables code to see if something could be triggering the unicode nt error. We noticed that we were using .putNumber to send string values over our vision table, and in case this was the issue, we switched those functions to use .putValue instead. Since changing this function over, I haven't been able to get the unicode nt error.

I'm not sure if it's related, but switching our .putNumber functions to use .putValue instead has seemed to fix things.

When I get back to the shop tomorrow, I'll try switching that back and using the beta version to try to generate some of those log files. (I didn't realize until afterwards that my local pyenv virtual environment was causing me to use the pypi version instead of the beta version when I ran programs in my pynetworktables2js-based Dashboard folder...)

robotpy / pynetworktables

UnicodeDecodeError trying to connect to Network Tables on roboRIO (Java) from raspberry pi with Microsoft Lifecam #42