Open palango opened 5 years ago
Hrm.
python 2.7 improved the performance of the stdlib json serialiser, but it did not integrate all of the optimisations from simplejson
, and as the comment says, I still found it to be substantially slower than simplejson.
I'd therefore be reluctant to simply discard simplejson without further benchmarking. I'd be open to dropping the dep on simplejson if it could be shown that on recent (say, python 3.5) versions of the stdlib the performance was comparable with simplejson
.
I'd also consider making the dependency optional, but am wary of introducing strange effects like that seen in https://github.com/kennethreitz/requests/issues/3052.
I'm not quite clear on what problem you're trying to solve: is simplejson
failing to parse valid json?
I guess a final option would be to use json
by default but allow the caller to pass in an alternative json
implementation as a keyword param.
I'm not quite clear on what problem you're trying to solve: is simplejson failing to parse valid json?
No, the problem we experience is ultimately to blame on https://github.com/kennethreitz/requests/issues/3052 -- when trying to handle parsing exceptions, we mismatch on the exception type, because requests
uses simplejson
if available. Since pulling matrix-synapse
as a dependency, this is the case now.
Unfortunately it doesn't look like https://github.com/kennethreitz/requests/pull/3056 will ever make it into the requests
2.x series, or be released before 2020 (see yellow note here), so politely asking to get rid of the dependency is our way of avoiding monkey patching or code arounds.
I guess a final option would be to use
json
by default but allow the caller to pass in an alternative json implementation as a keyword param.
Allowing simplejson
as an extra could be a possibility, too, no?
I'd therefore be reluctant to simply discard simplejson without further benchmarking. I'd be open to dropping the dep on simplejson if it could be shown that on recent (say, python 3.5) versions of the stdlib the performance was comparable with
simplejson
.
According to https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/ json
in python 3.5 is slightly faster than simplejson
, but slower in python 2.7
. SO I guess it depends on how important python 2.7 is for the user base.
Allowing simplejson as an extra could be a possibility, too, no?
That sounds like a good solution, keeping the option to use simplejsonn
for python 2.7 users, but defaulting to json
.
According to https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/ json in python 3.5 is slightly faster than simplejson, but slower in python 2.7. SO I guess it depends on how important python 2.7 is for the user base.
I looked at those metrics when I did the benchmarking I referred to earlier. What I found is that the performance of the various json libraries varied dramatically according to other options. In particular, stock json
was much slower than simplejson
with sort_keys=True
(as used by canonicaljson
). So I'm afraid those metrics aren't much use here.
Allowing simplejson as an extra could be a possibility, too, no?
That sounds like a good solution, keeping the option to use simplejsonn for python 2.7 users, but defaulting to json.
Isn't this what requests
did, and which caused you all sorts of problems? I'm not enthusiastic about repeating that mistake. I'd prefer the idea of letting the caller actively choose to pass in an alternative implementation.
@palango any further thoughts here?
hrm, I did some benchmarking (based https://github.com/richvdh/json_benchmark).
using python 3.7.5 (seconds per loop):
dumps (large obj) dumps (small objs)
json 2.0.9 0.015 0.020
simplejson 3.17.0 0.030 0.046
So yeah, we should probably switch back to stock json
.
it seems like there was a substantial change to the performance of the stdlib json
somewhere between python 2.7 and 3.5. Now that 2.7 is unsupported, I'm not too worried about any performance regressions for it.
as per the benchmarks on #9, this would also allow us to remove the _unascii
magic.
An update on the situation here: as of https://github.com/matrix-org/python-canonicaljson/pull/25, it is possible to use stdlib json rather than simplejson for encoding; however, we still pull it in with setup.py
, so this wouldn't solve the original reporter's problem.
Properly dropping the simplejson
dependency requires thought about how to do so without breaking compatibility with older applications that might pass bytes
values into canonicaljson.encode
.
Currently the package pulls
simplejson
as a dependency.This caused some unexpected trouble in https://github.com/raiden-network/raiden/issues/4174 . It also seems that since python 2.6
simplejson
is no longer a big performance gain: https://github.com/kennethreitz/requests/issues/3052#issuecomment-197021335So my idea would be to remove the dependency. If you agree I can create a PR.