laktak / extrakto

extrakto for tmux - quickly select, copy/insert/complete text without a mouse
MIT License
878 stars 45 forks source link

UnicodeDecodeError on start sometimes when using fzf & delta #75

Closed Frederick888 closed 3 years ago

Frederick888 commented 3 years ago

I was using fzf & delta to view the git history graph of eclipse.jdt.ls and bumped into the following exception:

Traceback (most recent call last):
  File "/Users/frederick/.tmux/plugins/extrakto/scripts/../extrakto.py", line 208, in <module>
    main(parser)
  File "/Users/frederick/.tmux/plugins/extrakto/scripts/../extrakto.py", line 141, in main
    text = sys.stdin.read()
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 513: invalid start byte

So I modified the code to the following and reproduced the case:

    text = None
    try:
        text = sys.stdin.read()
    except UnicodeDecodeError as e:
        f = open('/tmp/extrakto.bin', 'w+b')
        f.write(e.object)
        f.close()
        raise e
File "/Users/frederick/.tmux/plugins/extrakto/scripts/../extrakto.py", line 215, in <module>
  main(parser)
File "/Users/frederick/.tmux/plugins/extrakto/scripts/../extrakto.py", line 148, in main
  raise e
File "/Users/frederick/.tmux/plugins/extrakto/scripts/../extrakto.py", line 143, in main
  text = sys.stdin.read()
File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 15361: invalid start byte

(15361 is 0x3c01 btw)

00003b10: 2020 2020 2020 2020 2020 2020 20e2 9482               ...
00003b20: 0a20 202a 2066 3135 6438 6135 204a 696e  .  * f15d8a5 Jin
00003b30: 626f 2057 616e 6720 4164 6420 6665 6174  bo Wang Add feat
00003b40: 7572 653a 2061 7574 6f6d 6174 6963 2073  ure: automatic s
00003b50: 6f75 7263 6520 7265 736f 6c75 7469 6f6e  ource resolution
00003b60: 2066 6f72 2063 6c61 7373 6573 2069 6e20   for classes in 
00003b70: 6a61 7273 2077 6974 6820 6d61 7665 6e20  jars with maven 
00003b80: 636f 6f72 642e 2e20 e294 8220 e294 80e2  coord.. ... ....
00003b90: 9480 e294 80e2 9480 e294 80e2 9480 e294  ................
00003ba0: 80e2 9480 e294 80e2 9480 e294 80e2 9480  ................
00003bb0: e294 80e2 9480 e294 80e2 9480 e294 80e2  ................
00003bc0: 9480 e294 80e2 9480 e294 80e2 9480 e294  ................
00003bd0: 80e2 9480 e294 80e2 9480 e294 80e2 9480  ................
00003be0: e294 80e2 9480 e294 80e2 9480 e294 80e2  ................
00003bf0: 9480 e294 80e2 9480 e294 80e2 9480 e294  ................
00003c00: a080 e294 80e2 9480 e294 80e2 9480 e294  ................
00003c10: 80e2 9480 e294 80e2 9480 e294 80e2 9480  ................
00003c20: e294 80e2 9480 e294 80e2 9480 e294 80e2  ................
00003c30: 9480 e294 80e2 9480 e294 80e2 9480 e294  ................
00003c40: 80e2 9480 e294 80e2 9480 e294 80e2 9480  ................
00003c50: e294 80e2 9480 e294 80e2 9480 e294 80e2  ................
00003c60: 9480 e294 80e2 9480 e294 80e2 9480 e294  ................
00003c70: 80e2 9480 e294 80e2 9480 e294 80e2 9480  ................
00003c80: e294 80e2 9480 e294 80e2 9480 e294 80e2  ................
00003c90: 9480 e294 80e2 9480 e294 80e2 9480 e294  ................
00003ca0: 80e2 9480 e294 80e2 9480 e294 80e2 9480  ................
00003cb0: 2020 2020 2020 2020 2020 2020 2020 2020                  
00003cc0: 2020 2020 2020 2020 2020 2020 2020 2020                  
00003cd0: 2020 2020 2020 2020 2020 2020 2020 2020                  
00003ce0: 2020 2020 20e2 9482 0a20 202a 2037 6663       ....  * 7fc
00003cf0: 6431 6230 204a 696e 626f 2057 616e 6720  d1b0 Jinbo Wang 
00003d00: 5570 6461 7465 2074 6865 2073 7570 706f  Update the suppo
00003d10: 7274 6564 2066 6561 7475 7265 7320 6c69  rted features li
00003d20: 7374 2069 6e20 5245 4144 4d45 2028 3320  st in README (3 
00003d30: 6d6f 6e74 6873 2061 676f 2920 2020 2020  months ago)     

There is a 0xA0 (non-breaking space?) got squeezed into a bunch of U+02500/0xE29480 (box drawings light horizontal) for some reason. Visually it looked alright:

image

Since in terminal there are sometimes weird control/drawing characters, I wonder if it's possible to ignore them?

laktak commented 3 years ago

Thanks and a good suggestion!

Can you try to replace this line

    # text = sys.stdin.read()
    text = sys.stdin.buffer.read().decode("utf-8", "ignore")

to see if it fixes the problem?

Frederick888 commented 3 years ago

@laktak Yes it did fix it. Thank you.