komsit37 / sublime-q

Sublime Text Plugin for q/kdb
MIT License
24 stars 9 forks source link

latin-1 / utf-8 codec can't encode/decode #21

Closed atf1206 closed 4 years ago

atf1206 commented 4 years ago

This may be intended, just fyi: I'm getting "Error in QSendRawCommand.sendAndUpdateStatus:" and then either: "'latin-1' codec can't encode characters..." (on send) or "'utf-8' codec can't decode byte..." when I try to send or receive characters above \200 until \371. E.g. `$"\201" fails. I think this worked until recently; not sure if you changed the char encoding intentionally here.

komsit37 commented 4 years ago

Thanks for reporting the issue. This looks like a python string decoding issue. I'm not familiar with this, so I will outline the problem here. If anyone knows how to properly decode this, please let me know. You can try this in sublime text console (go to View/Show Console).

  1. After sending `$"\201" to kdb, as a response, we receive bytes in python which looks like line (1)
  2. We need to convert this to string for outputting. However, python can't decode this to utf-8 line (2)
    (1)>>> x=b'`\x81\n'
    >>> x
    b'`\x81\n'
    >>> print(x)
    b'`\x81\n'
    (2)>>> x.decode('utf-8')
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte

I think this worked until recently; not sure if you changed the char encoding intentionally here.

No, not intentional. I don't think I've changed the decoding related code either (I have refactored it, but the logic doesn't change). The decoding code is here just fyi https://github.com/komsit37/sublime-q/blob/master/util.py#L11

atf1206 commented 4 years ago

Hi Komsit37, I figured this one out. The version of qPython that sublime-q is currently using encodes and decodes with "latin-1" (which does not support characters beyond the basic set) as opposed to UTF-8.

This can be patched without too much trouble -- it just requires a small change to how binary string length is calculated. However, there is another option: the newest version of qPython defaults to latin-1, but can be overriden in qconnection using encoding = 'UTF-8'.

What do you think about upgrading to the latest qPython? Either way, I can create a pull request with the update and the utf-8 change, but because it is such a core change to the code I suggest we test quite a bit before merging.

komsit37 commented 4 years ago

Cool, yup we could try to upgrade qpython. I checked the diff. The upgrade shouldn't be too bad (as long as we don't need to change numpy dependency part). Either 2.0.0 or 1.2.2 should be ok. https://github.com/exxeleron/qPython/compare/qPython-1.1.0...qPython-1.2.2 https://github.com/exxeleron/qPython/compare/qPython-1.2.2...2.0.0

Agreed we would need some test. But also shouldn't be too bad since we don't use so much data types. We mostly just decode to string.

komsit37 commented 4 years ago

fixed in https://github.com/komsit37/sublime-q/pull/28