zotero / citeproc-js-server

Web service to generate citations and bibliographies using citeproc-js
Other
60 stars 39 forks source link

fix: specify encoding utf-8 when opening a file #44

Closed northword closed 2 years ago

northword commented 2 years ago

When a csl file contains gbk characters, for example, china-national-standard-gb-t-7714-2015-numeric.csl, the following error will be reported when the file is opened without specifying the encoding.

PS D:\Code\zotero-cn\citeproc-js-server-master> python .\xmltojson.py ./csl ./csljson
converting ./csl\china-national-standard-gb-t-7714-2015-numeric.csl to ./csljson\china-national-standard-gb-t-7714-2015-numeric.csl
Traceback (most recent call last):
  File "D:\Code\zotero-cn\citeproc-js-server-master\xmltojson.py", line 92, in <module>
    doc = w.makedoc(open(fullname).read())
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 393: illegal multibyte sequence

The script should work fine with this commit modification, for example, suffix of the year data-part will be converted to \u5e74 as shown in the figure below.

image

dstillman commented 2 years ago

For what it's worth, this would've already been working fine in Python 3 on most Linux/Mac systems.

If you're using Windows, you may need to enable Python UTF-8 mode, but I would recommend doing that anyway.

No harm in making this more explicit, though, so I've merged this. Thanks!