sphinx-contrib / emojicodes

An extension to use emoji codes in your Sphinx documentation! 😍
https://sphinxemojicodes.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
50 stars 15 forks source link

'charmap' codec can't decode byte X in position Y: character maps to <undefined> #9

Closed trustin closed 3 years ago

trustin commented 5 years ago

In a system whose system encoding is not UTF-8, sphinxemoji will fail with UnicodeDecodeError:

  File "C:\projects\sphinx-binary\build\venv\lib\site-packages\sphinx\cmd\build.py", line 284, in build_main
    app.build(args.force_all, filenames)
  ...
  File "C:\projects\sphinx-binary\build\venv\lib\site-packages\docutils\transforms\__init__.py", line 172, in apply_transforms
    transform.apply(**kwargs)
  File "C:\projects\sphinx-binary\build\venv\lib\site-packages\sphinxemoji\sphinxemoji.py", line 26, in apply
    replacements = json.load(open(codes))
  File "C:\Python36\lib\json\__init__.py", line 296, in load
    return loads(fp.read(),
  File "C:\Python36\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 18: character maps to <undefined>
Encoding error:
'charmap' codec can't decode byte 0x8d in position 18: character maps to <undefined>

The problem could be fixed by passing the encoding='utf-8-sig' parameter to the open(codes) call.

trustin commented 5 years ago

Patch:

diff -urN dist.orig/sphinxemoji/sphinxemoji.py dist/sphinxemoji/sphinxemoji.py
--- dist.orig/sphinxemoji/sphinxemoji.py    2019-07-11 09:13:36.000000000 +0900
+++ dist/sphinxemoji/sphinxemoji.py 2019-07-26 23:40:15.282033254 +0900
@@ -23,7 +23,7 @@
         config = self.document.settings.env.config
         settings, source = self.document.settings, self.document['source']
         codes = resource_filename(__name__, 'codes.json')
-        replacements = json.load(open(codes))
+        replacements = json.load(open(codes, encoding='utf-8-sig'))
         to_handle = (set(replacements.keys()) -
                      set(self.document.substitution_defs))
Peque commented 5 years ago

Thanks for reporting this issue. Does it work fine for you if you set encoding='utf'?

I will try to add AppVeyor to CI to check Windows compatibility.

trustin commented 5 years ago

Yes, it works fine with encoding='utf-8-sig'.

Peque commented 5 years ago

I meant with encoding='utf', without the -8-sig (I have not tried). :innocent:

trustin commented 5 years ago

I didn't try it either yet, but is it a valid encoding at all? (I'm not a Python expert so.. :sweat_smile:)

honzajavorek commented 4 years ago

BTW, GitHub Actions offer free CI for Linux, Windows, and macOS. Perhaps the project could use it to test for problems like this. If someone sets up a test, I'm willing to contribute the Actions config file for running it automatically on every push.

Peque commented 4 years ago

@honzajavorek It would be great to add some CI checks.

There are currently no tests in this project, but we could add some checks for the documentation. I opened an issue, feel free to contribute the required config file. :blush: