matrix-org / python-canonicaljson

Canonical JSON
Apache License 2.0
31 stars 15 forks source link

Pretty-printed JSON is ASCII-encoded whereas it is otherwise UTF-8-encoded, causing errors #33

Closed reivilibre closed 4 years ago

reivilibre commented 4 years ago

With Synapse, I run into an issue doing an initial sync with curl, because it switches to pretty-printed output:

Aug 24 14:14:02 sallie.librepush.net synapse[3123612]: During handling of the above exception, another exception occurred:
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]: Traceback (most recent call last):
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:   File "/home/synapse/venv/lib/python3.8/site-packages/synapse/http/server.py", line 233, in _async_render_wrapper
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:     self._send_response(request, code, response)
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:   File "/home/synapse/venv/lib/python3.8/site-packages/synapse/http/server.py", line 289, in _send_response
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:     respond_with_json(
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:   File "/home/synapse/venv/lib/python3.8/site-packages/synapse/http/server.py", line 536, in respond_with_json
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:     json_bytes = encode_pretty_printed_json(json_object) + b"\n"
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:   File "/home/synapse/venv/lib/python3.8/site-packages/canonicaljson.py", line 96, in encode_pretty_printed_json
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]:     return _pretty_encoder.encode(json_object).encode("ascii")
Aug 24 14:14:02 sallie.librepush.net synapse[3123612]: UnicodeEncodeError: 'ascii' codec can't encode characters in position 16800-16801: ordinal not in range(128)

On the other hand, if I use curl's -A flag to change user-agent, I get the non-pretty-printed output and it succeeds.

clokep commented 4 years ago

I think the fix is to just change the encoding for pretty-printing to UTF-8, but I don't know why it was chosen to be ASCII in the first place?

clokep commented 4 years ago

Looks like it has always been this way, since aa84dddb7109e42921d5d54a417d421e77daf634 (the initial commit).

clokep commented 4 years ago

Seems like this has hit a few people fairly rapidly, so we should likely do a release including the fix.