psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.21k stars 9.34k forks source link

Don't override `Authorization` header when contents are bearer token (or any other token) #3929

Open tomvlk opened 7 years ago

tomvlk commented 7 years ago

I found out that the requests lib is overriding the authorization header when a netrc file is in place, which is awesome. But in some cases you won't want this at all, and is a design flaw imo.

For example you use a bearer token, it gets replaced by the user/password from netrc.

Also see the issue here: https://github.com/python-social-auth/social-core/issues/43

Lukasa commented 7 years ago

This behaviour can be overridden by trust_env, which allows you to instruct Requests to ignore the .netrc file. Is that sufficient for your purposes?

tomvlk commented 7 years ago

Well, if you look at the issue of the social auth library, you may see that it's really nice to override the user/pass. Short explanation about my situation related to social library: I'm using a webservice that requires you to use the username/password HTTP authentication, for this I require the use of .netrc which is perfect. But on the same domain/path there is a oAuth2 endpoint that uses the Authorization header with the oAuth tokens.

If the .netrc feature/standard is only supporting username/password, why not replacing header when not containing a bearer or any other token. This could be done with checking the contents if the header is manually provided.

Lukasa commented 7 years ago

So, I am open to that, but nervous about it. There are failure modes at either side, and trying to be "clever" may cause us to just be difficult to understand. The virtue of the model we have today is that it is very simple and consistent. So my question stands: do the functions currently available suffice for your use case?

tomvlk commented 7 years ago

Well I agree on the fact that it's not really nice to change such behavior right now as it's always breaking something. And indeed the usage of the trust_env is a good option, but in this case the author of the library should give us an option to enable/disable it as a backend developer. Also the trust_env will disable more environmental settings or behavior and not only the netrc function, like proxy settings if I understood right.

Another option would be turning it on/off per request that overrides the session trust_env, or have another way to not override one specific header.

Anyway, the module is already trying to be "clever" by replacing the whole header :smiley: . Which is great when you have full control over the Session.

Lukasa commented 7 years ago

So it is quite possible that the library wrapping requests should be setting trust_env to False if it is handling headers itself.

tomvlk commented 7 years ago

Yep, and that should be the case. But it remains that if you have mixed requests, like I have, it's kinda hard to manage. You need to have two sessions.

Lukasa commented 7 years ago

Well, you don't really. You can just flip the setting around as-and-when you need it.

However, here's a framework I'd consider for handling auth in the 3.0 branch. I'd welcome a PR to make this the case.

  1. If the user sets an Authorization header themselves, either via the request or on the Session, we don't bother to look at the netrc file.
  2. If they didn't, we look at the netrc file for basic auth.
  3. If we get redirected, we fall back to only looking at the netrc file (which we already currently do).

Does that sound like a reasonable approach to your case?

tomvlk commented 7 years ago

That sounds like a pretty clear way to solve this case. Only downside is that it can cause breaking code. But it's better to not force override when user manually given the details in my opinion.

I'm not sure if I'm capable of doing the PR myself due to time limits.

fjarlq commented 7 years ago

I encountered this problem when trying to figure out why the python-digitalocean module, which uses Requests, was failing due to an unexpected authentication error.

The root cause turned out to be this default directive in my $HOME/.netrc:

default login anonymous password anonymous@

which I used, many years ago, to automate my anonymous FTP logins.

I'm surprised that this directive in my .netrc would cause Requests to automatically override the authentication information that is being specified explicitly by python-digitalocean, especially since the directive is merely setting default login information that is used when accessing a host that lacks an explicit machine entry in .netrc.

So I like @Lukasa's idea above: when the caller specifies an Authorization header, I think the .netrc directives (whether default or machine) should be ignored. Thanks!

lmazuel commented 6 years ago

Got bitten by this one as well :( To answer @Lukasa question:

do the functions currently available suffice for your use case?

I would say no, because trust_env is not only netrc, it's also REQUESTS_CA_BUNDLE for instance. Right now it's a little too much "take it or leave it" for the entire set of possible env stuff I could want to use :(

tkdchen commented 6 years ago

It would be nice to allow to disable reading from .netrc explicitly. Currently, it looks requests also handles an existing .netrc even if for requests.get which is expected to be a anonymous request in most cases.

tkdchen commented 6 years ago

How about provide a auth class something like NetrcAuth instead?

danrue commented 6 years ago

Coming here after spending several hours debugging an issue which ended up being the presence of a ~/.netrc file. This behavior violates POLA and should be explicitly enabled rather than enabled by default.

bors-ltd commented 5 years ago

I lost half a day because I could not log to production any more, and I couldn't find the issue in our infrastructure. Found out it was because I stored my password in ~/.netrc and requests read it and added an Authorization header when I was using a Bearer instead, and got rejected from the server.

It should only happen with an explicit BasicAuth().

Lord-of-the-Galaxy commented 4 years ago

This issue still exists. Ideally, requests would only use the credentials in the netrc when there is no authorization header explicitly supplied.

klange-git commented 2 years ago

I think this behavior should be changed. I installed an FTP package that silently generated a sample ~/.netrc. This prevented me from using gcloud (the GCP CLI) because gcloud uses Requests and Requests prefers ~/.netrc over gcloud's OAuth headers. I agree fullly with Lord-of-the-Galaxy's suggestion.

jacksgt commented 2 years ago

I also just came across this behavior (based on a user report and it took us about a month to figure out that the presence of a .netrc was the "issue" in the user's environment) It's quite incredible that python-requests will overwrite a header explicitly set by the programmer - I mean, I'm not writing headers={'Authorization': 'xyz'} for fun in my code!

nedbat commented 1 year ago

I also spent a while debugging this exact problem, and I'll add my voice: requests should not override an explicitly set "Authentication" header.

hellosputnik commented 1 year ago

I tried in https://github.com/psf/requests/pull/6555, guys 😅

I essentially just implemented @Lukasa's suggestion, but I got slapped first with "It's documented!", followed by "Why aren't you writing an auth adapter instead of adding/modifying the header?", and finally "It's not backwards compatible so fuck off!". The standoffish attitude seems to be typical behavior for this maintainer so at least it's not personal.

I would've been happy to discuss alternatives such as displaying a warning to the user when the Authorization header has been set and has been overridden by .netrc or introducing a new parameter to control .netrc or waiting for a major release for the changes to adopted.

It's disappointing to say the least.

perthi commented 10 months ago

I also struggled with this problem for about a day and had to dig deep into the http library to figure out that my .netrc file (which was valid but for another host was to blame). No warnings, nothing.

The behavior doesn't makes sens, and is also inconsistent.

I think its fair to call this a bug, that has cos a lot of working hours for many people. In my case i need to keep the .netrc file for another API

A workaround for the bug is something like this

s=requests.Session()
s.trust_env=False
response = s.get(CUSTOMER_LOCATIONS_ENDPOINT, headers = hdr, data=data2, verify=None) 

So 3 lines of code, instead of one for the same operation.

True, the trus_env variable is documented, but why not set it to false as default, or at very least write a warning f headers are ignored.