willtrnr / pyxlsb

Excel 2007+ Binary Workbook (xlsb) reader for Python
GNU Lesser General Public License v3.0
90 stars 21 forks source link

chinese garbled #11

Closed lves-han closed 6 years ago

lves-han commented 6 years ago

hi, when I use pyxlsb to read something include chinese, it's always return garbled code of chinese. And I can't mend it with coding like gbk or utf8 or gb18030. How can I do ?

willtrnr commented 6 years ago

Things are decoded as latin-1 right now.

I've started adding more flexibility for the encoding used but it's not quite there yet.

DavidCooper commented 6 years ago

The fix is straightforward, instead of 'latin-1' you need to use 'UTF-16', the Microsoft encoding used in XLSB format, and further you need to remove the buff.decode(self._enc).replace('\x00', '') and instead return buff.decode('UTF-8'). Hoping you can push a fix for this soon, so I can use it in my projects without having to patch the installation : )

willtrnr commented 6 years ago

@lves-han @DavidCooper since this was a simple fix, I've released v1.0.4 on PyPI fixing this issue, you should be able to update your dependencies

DavidCooper commented 6 years ago

Many thanks!