'utf8' codec can't decode byte 0xc2 in position 0

tbird20d / grabserial

Grabserial - python-based serial dump and timing program - good for embedded Linux development

GNU General Public License v2.0

195 stars 77 forks source link

'utf8' codec can't decode byte 0xc2 in position 0 #21

Closed KcMeterCEC closed 5 years ago

KcMeterCEC commented 6 years ago

Hi, I have clone you repository to local and use command sudo python setup.py install to install . I want to test the boot time about linux kernel,so i type:

sudo grabserial -v -d /dev/ttyUSB0 -e 50 -t -m "^U-Boot*"

Unfortunately,the grabserial was crash after few seconds. This is what the python warning out:

File "/usr/local/bin/grabserial", line 4, in import('pkg_resources').run_script('grabserial==1.9.6', 'grabserial') File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 719, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1511, in run_script exec(script_code, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/grabserial-1.9.6-py2.7.egg/EGG-INFO/scripts/grabserial", line 530, in File "/usr/local/lib/python2.7/dist-packages/grabserial-1.9.6-py2.7.egg/EGG-INFO/scripts/grabserial", line 419, in grab

File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc2 in position 0: unexpected end of data

TarjeiD commented 6 years ago

I've experienced the same problem in another setup and fixed it by: # Ignore the malformed data and continue without further notice x = sd.read(1).decode(sys.stdout.encoding, "ignore")

henning-schild commented 6 years ago

@TarjeiD indeed the introduction of decode() has the potential to raise errors

I would suggest to use error="ignore" to make clear what the "ignore" stands for, and did you try the other valid options as well? "backslashreplace" and "replace"? Would that mess up the output you are seeing? https://docs.python.org/3/howto/unicode.html#the-string-type

KcMeterCEC commented 6 years ago

@TarjeiD It's worked for me, thanks a million!

TarjeiD commented 6 years ago

@henning-schild I have not tried any other options since "ignore" solves my problem. The device I'm testing change baud rate during boot, hence the non-valid start is discarded anyway.

henning-schild commented 6 years ago

@TarjeiD i asked because i would like you to try, in order to come up with a patch that does not just work for you but can be merged To me it sounds like ignore has the potential of dropping possibly relevant characters, while the other two might result in something not readable anymore. Or you might open a PR where you choose the best option and possibly discuss why you prefer that one.

mungewell commented 6 years ago

Came to report the same failure.

Traceback (most recent call last):
  File "grabserial", line 531, in <module>
    grab(sys.argv[1:])
  File "grabserial", line 419, in grab
    x = sd.read(1).decode(sys.stdout.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte

For me it happens with following data from serial port

000007a0  79 43 6c 6f 63 6b 0d 0a  5b 55 54 4f 50 49 41 20  |yClock..[UTOPIA |
000007b0  49 4e 46 4f 5d 20 63 6c  6f 73 65 20 6d 6f 64 75  |INFO] close modu|
000007c0  6c 65 4e 61 6d 65 73 5b  74 75 5d 3a 20 a0 2b a0  |leNames[tu]: .+.| <------ here
000007d0  04 a0 2b e0 04 a0 2b 20  05 a0 2b 60 05 a0 2b e0  |..+...+ ..+`..+.|
000007e0  05 a0 2b 78 56 34 12 0d  0a 0d 0a 5b 41 54 5d 5b  |..+xV4.....[AT][|

"x = sd.read(1).decode(sys.stdout.encoding, "ignore")" worked for me, stopped the crash anyhow. utf-8_fail.txt utf-8_fail.zip

tbird20d commented 5 years ago

Hey everyone. Sorry it took me so long to look at this. I just did a big pass over grabserial to examine unicode issues, and I added the 'ignore' decoding option for when the bytes are sent to sys.stdout. However, I also modified the code to save the bytes unmodified, if a user specifies to output data to a separate output file (using the -o option).

This is about as good a solution to this problem as I could find.

Thanks for the bug report, and the description of the workaround.

Regards, -- Tim