Closed richvdh closed 4 years ago
Did a little experimenting, and it turns out that:
bytes
by default, andbytes
to canonicaljson in some places.So if canonicaljson is changed to use the stdlib json
, either it'll have to decode bytestrings it gets, or synapse needs a patch to stop passing bytes to canonicaljson.
Actually it's even worse than that: some of the C-S handlers also return bytes
, I just found /voipServer does.
to demonstrate the problem:
>>> import simplejson
>>> simplejson.dumps(b"abc")
'"abc"'
>>> import json
>>> json.dumps(b"abc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.6/json/encoder.py", line 180, in default
o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable
I think it was mentioned in #synapse-dev:matrix.org, but I believe the first step of this was to fix the places where Synapse passes bytes into canonicaljson.
Next steps on this:
bytes
objects into json.dumps
or canonicaljson.encode
:
/voipServer
is the only one I'm aware ofcanonicaljson
that uses json
rather than simplejson
canonicaljson
; this will probably break a number of tests canonicaljson
, this fixing https://github.com/matrix-org/python-canonicaljson/issues/23While we're working on canonicaljson
, assuming we decide it is safe to switch to stock json
, it would be good to revert the non-test parts of https://github.com/matrix-org/python-canonicaljson/pull/9, which, as per the comments there, are not required with stock json
.
Looks like we also directly use simplejson
in a bunch of places in Synapse. I'll clean those up too.
The build in #7803 passed, which means that current develop can run with the standard library JSON module.
I don't think we can just merge matrix-org/python-canonicaljson#24 and release canonicaljson though since older Synapse's might upgrade to this newer version of canonicaljson, which would break them.
We could release a Synapse which sets a maximum version of canonicaljson to the current version, then release canonicaljson and remove the max limit in Synapse? Not sure if there's a better way to do that.
A few more options from Rich (see conversation):
- make canonicaljson import one of two libraries depending on what you set some global variable to
This seems plausible, although is a bit ugly.
- flip around which level picks the json impl. So rather than doing from canonicaljson import json everywhere, have synapse pass in a json impl when it invokes CJ
This would work, but means we'll have to constantly recreate JSON encoders, which would probably have a performance impact. (Or I guess we could have an initialization function which would get called to initialize the encoders or something?)
- have a whole new library called canonicaljson2
This would work fine, but will require changing the package and module names, updating Synapse to use canonicaljson2
, and all of the imports.
- vendor the 10 lines of python that we actually need into Synapse
I assume there was a good reason to have this be a separate library in the first place, not sure if those reasons are still valid.
I think the easiest way forward is the first option:
The major downside for any of the options that have us keeping support for older Synapses in canonicaljson is that matrix-org/python-canonicaljson#24 cannot be merged, we can tweak something so that most of the benefits are used though. The only way I see around that is releasing it as a canonicaljson2 library.
My current plan for this is:
I think this is tenable and isn't too much of a faff.
While workin gon #7670 I realized that this isn't fully done. We do use the standard library JSON for all all canonicaljson operations, but not for other operations.
There's a handful of places that we from canonicaljson import json
, which actually ends up with the simplejson implementation. Two ways to fix this:
import json
.json
variable in the set_json_library
function.Any thoughts on which of these is the "right" way to go?
I'm inclined towards the former. Seems simpler?
That was my thought too. I think importing from canonicaljson is actually just a bit confusing. Especially now that we explicitly declare we want to use the json from the standard library.
I found my benchmark data and it looks like stdlib json is faster than simplejson for dumping data nowadays.
Using python 3.7.5 (seconds per loop: smaller is better):