osquery / osquery-python

Python bindings for osquery's Thrift API
Other
292 stars 51 forks source link

UnicodeDecodeError in Python3 #61

Open JarryShaw opened 5 years ago

JarryShaw commented 5 years ago

This is an issue with thrift (dependency of this library), an open issue is already filed to that project.

Environment:

When querying, UnicodeDecodeError raised with error message: "'utf-8' codec can't decode byte 0xc3 in position 0: invalid continuation byte" from thrift.compat.binary_to_str, which is because the encoding of bin_val parameter should be "gbk".

Maybe try hacking the source code of thrift and include it as a vendor package when distribution? (just as pipenv and other projects do)

theopolis commented 5 years ago

@jarryshaw, did you have a chance to follow up on the comments on the Thrift bug report?

JarryShaw commented 5 years ago

It's been quite a long time ago and I'm trying to reproduce the issue recently. Btw, I just found two other issues 🤦‍♂ I'll make a pull request on one of them.

JarryShaw commented 5 years ago

Also, FYI, you can find the failed query at THRIFT-4677.

It should be linked to Windows internal issue. Some of the Chinese contexts are encoded with utf8, such as os_version, whilst some of them are encoded with system legacy encoding (cp936/gbk/gb2312 in my case), for example scheduled_tasks.

Also, according to James, contributor of Thrift, "Thrift only handles strings as UTF8 internally." Maybe this is some issue related to osquery internal data schema or some design fraud with Thrift.