psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.15k stars 9.32k forks source link

Using a `json.JSONDecoder` fails when simplejson is present #4842

Open ricellis opened 6 years ago

ricellis commented 6 years ago

It appears that when simplejson is in the environment requests preferentially imports it. However, the arguments accepted by JSONDecoder are inconsistent between the standard library json and simplejson leading to errors when using the standard json.JSONDecoder when simplejson is in the environment.

Your documentation for response.json()says:

Parameters: | **kwargs – Optional arguments that json.loads takes.

The documentation for json.loads says:

json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

and

To use a custom JSONDecoder subclass, specify it with the cls kwarg; otherwise JSONDecoder is used. Additional keyword arguments will be passed to the constructor of the class.

I expected to be able to use a custom json.JSONDecoder without issue and indeed this works on Python 2.7, but in Python 3.7 it fails.

I can see that the issue of preferentially importing simplejson has been raised a few times e.g. https://github.com/requests/requests/pull/2516 https://github.com/requests/requests/issues/3052

and that it is slated for removal in Requests 3, which I guess will resolve this issue.

If it is not possible to resolve this issue some other way in Requests 2.x then it would be nice if the documentation around response.json() was updated to make it clear that the arguments to json.loads could be either simplejson.loads or the standard lib json.loads depending on the environment since the preferential import is effectively undocumented and non-obvious.

Expected Result

A successful JSON load when using r.json(cls=json.JSONDecoder).

Actual Result

    print(r.json(cls=json.JSONDecoder))
  File ".../python3.7/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File ".../python3.7/site-packages/simplejson/__init__.py", line 535, in loads
    return cls(encoding=encoding, **kw).decode(s)
TypeError: __init__() got an unexpected keyword argument 'encoding'

Reproduction Steps

pip install requests
pip install simplejson
import json
import requests

r = requests.get('http://localhost:5984')
print(r.json(cls=json.JSONDecoder))

System Information

Note this issue does not occur in Python 2.7.x.

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.7"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.0"
  },
  "platform": {
    "release": "18.0.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.20.0"
  },
  "system_ssl": {
    "version": "1000210f"
  },
  "urllib3": {
    "version": "1.24"
  },
  "using_pyopenssl": false
}
nateprewitt commented 6 years ago

Hi @ricellis, thanks for revisiting this. I don't believe we'll be seeing this removed in the near future until 3.0.0 is ready. If you'd like to make a small addendum to the documentation about this, we'd be happy to review it.

ricellis commented 6 years ago

Thanks @nateprewitt I can update the docs, but would you be willing to accept a patch that protects people from the error if they are expecting to be using json, but simplejson happens to be present in the env (probably unbeknownst to them).

For example adding a check to Response.json() that uses the built-in json package if cls extends the built-in json.decoder.JSONDecoder. Something like this https://gist.github.com/ricellis/0690ebfc7f1a56df64df87c7a5251e77

sigmavirus24 commented 6 years ago

Your proposed change breaks the deterministic behaviour of that function which is not ideal in the slightest. I don't think we can accept that.

ricellis commented 6 years ago

breaks the deterministic behaviour of that function

It's annoying that the json() method could end up using a different module to what is used elsewhere, but that seems better than an error to me. It is deterministic though, the outputs are governed by the inputs and I think it only impacts the case that is currently broken:

simplejson cls current proposal
yes extends json.JSONDecoder Error Decoded by json using cls
yes None Decoded by simplejson Decoded by simplejson
no extends json.JSONDecoder Decoded by json using cls Decoded by json using cls
no None Decoded by json Decoded by json

not ideal in the slightest

It was just a first stab at something that might work and I was trying to contain the change to just the specific area of the problem.

I agree it is not the cleanest thing to do, but neither is swapping the JSON module with an API incompatible one. That's where this issue is different from the others that have been reported previously around the preferential import - this has identified a place where simplejson and json are not API compatible yet requests exposes that API to the end user through the Response.json() kwargs.

That in itself might not be a problem when the requests caller is also in control of the module environment. As you can see from our linked issue though the problem for us comes when a user of our library includes another module that itself has a dependency on simplejson. We are using the requests API in good faith, but it is broken by a transitive dependency over which we have no control. I think the discussion around simplejson that has preceded this issue didn't have to account for that.

Perhaps someone can think of a better option than my proposal or maybe an option to disable the simplejson preferential import could be revisited in light of this particular scenario?

Or if you're not interested in fixing 2.x at all then just say so and I'll submit a docs patch and spend my time implementing a workaround in our library instead.

ricellis commented 5 years ago

@nateprewitt @sigmavirus24 any thoughts on this ^?

SharpEdgeMarshall commented 4 years ago

Simplejson is slower than stdlib json implementation in python 3, why don't you want to allow to override the default behavior at least?

ruuda commented 4 years ago

I ran into this problem when we added a third-party library that had a transitive dependency on simplejson. That consequently broke all of our error handling, because requests now started throwing simplejson.errors.JSONDecodeError instead of json.JSONDecodeError that we had catches for. Judging from the referenced issues above, more users are affected.

verata-veritatis commented 3 years ago

Having the same issue as the user above. requests should just be using one or the another. Using either or conditionally only creates confusion.

BarryThrill commented 3 years ago

Any update :)?

rwst commented 2 years ago

I came here because berserk uses request, and this issue comes up. Can you plz give a sttatus update?

rwst commented 2 years ago

See also https://github.com/psf/requests/issues/3052

krystofernewman commented 2 years ago

Our team has run into a similar issue, where we have simplejson installed but we want to use a custom json encoder class in the prepare_body method which now fails

nateprewitt commented 2 years ago

Hi @krystofernewman, it doesn't sound like your issue is related to this one directly. We're specifically discussing simplejson's impact on Response.json(). There are likely similar issues on the json Request argument but that's a topic that would be handled separately.

simplejson support was added in a very different time (almost a decade ago) but has been the default for the majority of the library's life. We agreed some time ago that a documentation update noting this would be accepted but cannot change the behavior until a major version bump.

krystofernewman commented 2 years ago

@nateprewitt gotcha thanks for the response, I can open a separate issue for the impact of simplejson on the encoding of the request's body preparation if that works

olejorgenb commented 2 months ago

Another incompatible change in simplejson: https://github.com/simplejson/simplejson/issues/324 (eg.: by adding a allow_nan parameter and defaulting it to False in the decoder)