psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.01k stars 9.3k forks source link

Information leakage in default user agent #2785

Closed asieira closed 8 years ago

asieira commented 9 years ago

I've noticed that requests adds a user-agent header by default that looks like the following: python-requests/2.7.0 CPython/2.7.9 Linux/3.14.44-32.39.amzn1.x86_64.

Including the CPython and OS kernel version are an unnecessary information leakage that could have security implications, as per the OWASP documentation.

I would like to suggest the default is changed to follow the 'bot' convention described here with the requests version and a link to the requests documentation: python-requests/<version> (http://http://www.python-requests.org/).

sigmavirus24 commented 8 years ago

Eh, I'm 50/50 on this. I agree the default user-agent isn't ideal. I don't agree that this is a large problem. I'd like to hear @Lukasa's opinion on this but he's away on vacation (as I should be).

asieira commented 8 years ago

Agreed that this is not a critical issue, but it is a security problem nonetheless.

Thank you very much for the awesome project, enjoy your vacation and let me know if there's anything I can do to help. :)

sigmavirus24 commented 8 years ago

So, I've ruminated a bit on this now that I've had a chance and I'm not quite convinced it's a security problem.

  1. this is easily overridden by the user
  2. your attackers without knowledge of your use of requests, can not be certain that the information there is entirely accurate or if it has been faked (just like other user-agents can just as easily be used)
  3. attackers in this case are not very clear
    • An attacker could be the server your intentionally talking to (e.g., example.com) but at that point, there are potential other attacks that they could carry out even without (for example) knowledge of your Python version or kernel information
    • An attacker could be someone who is actively performing a MITM attack on a connection, in that case you have much larger problems than the attacker knowing your Python version or kernel information

In short, I think we need a much better threat model here before we change the existing behaviour. I'm fine documenting that for some use-cases the default user-agent is perhaps not ideal, and they can reference the existing documentation that shows how to set a UA.

asieira commented 8 years ago

My responses:

  1. the fact that there is a workaround (setting another UA value) doesn't make the default any less secure;
  2. since this is the default, I have to imagine the vast majority of requests-based software will be leaking valid data through this mechanism, so the odds are in favor of the attackers;
  3. let me try to help you understand the potential threats here:
    • If HTTP is being used (rather than HTTPS), then anyone in the communications path can access that information on any request it is able to observe. So one attacker inside an organization trying to do lateral movement into a neighboring server will be able to learn a lot about which software is running on the machine, for example.
    • There are many ways in which a requests-based client could be led to a malicious server: DNS poisoning, XSS on a valid web page, etc. If a malicious server is contacted by a requests-based client it will then learn useful information about the client that can help an attacker that controls it choose a method to try and compromise it. This obviously applies even to HTTPS traffic.
    • I disagree with "if you are vulnerable to X you have much larger problems already". Someone might have control over the network but know nothing about the target host OS and software versions, and giving them this information can make their lives a lot easier. Even if you argue the attacker could find the same information elsewhere, the fact that he is able to do it passively allows him to obtain that data while minimizing the chances of being detected. Most alternatives, such as doing a port scan with service fingerprinting, would be much noisier and might tip the defenders of the attacker's efforts.

At the end of the day, even if you think the security gain is marginal, I guess we can all agree that it is not zero. Also, AFAIK changing this default breaks no requests functionality whatsoever (correct me if I'm wrong here). So, I would still maintain this is worth doing.

That is just my opinion, of course. More than happy to speak on a Skype or Google Hangouts call since complex subjects such as these might be easier to discuss that way.

Lukasa commented 8 years ago

My general position on this is that removing the kernel version is a good idea and we should do it. Other UA strings do not include it, and it's unlikely to be of much use in any role other than attacking a Linux kernel directly, so I'd be happy to strip it.

I'm +0 on removing the Python version, though we need to confirm with @dstufft that pip's not relying on our use of it. I don't believe it's a serious attack vector, but neither is it information that it's vital to be sending in the UA string.

IMO the biggest security risk in there is actually the requests version, and that's the one thing it's hard to justify removing. ;)

dstufft commented 8 years ago

pip doesn't use the default user-agent at all, feel free to change it to whatever you want.

asieira commented 8 years ago

For the record I agree completely with @Lukasa, his proposal matches perfectly my original suggestion of using python-requests/<version> (http://http://www.python-requests.org/) as the default UA.

sigmavirus24 commented 8 years ago

@asieira that is not @Lukasa's proposal. Please don't misrepresent people's comments.

asieira commented 8 years ago

If I understand @Lukasa 's comments correctly, he agrees that a) removing the kernel version is a good idea, and that b) the Python version is not a vital information to include in the UA, so it could be removed provided this doesn't break pip (which was subsequently confirmed by @dstufft).

So that would leave us with a default UA with no kernel or Python version, and only contains the requests version. Which is precisely what I had originally proposed.

I'm sorry if I misunderstood any of that. I don't want to start a flame war here or anything, but now I'm genuinely curious as to what part of @Lukasa 's comment you believe I misinterpreted or misrepresented.

Lukasa commented 8 years ago

The thing you're disagreeing about is whether a URL would be contained in the User-Agent: I did not propose that, you did. =)

I'm -1 on the URL, but +1 on everything else.

sigmavirus24 commented 8 years ago

Damnit @Lukasa you always post a comment ~30s before I can.

@asieira I didn't mean to put you on the offensive. I simply didn't want someone coming along and implementing the wrong thing based on your comment. Clarity and correctness is important.

Frankly, I'm -0 on removing this information but I wouldn't block a PR removing it.

asieira commented 8 years ago

@Lukasa and @sigmavirus24 thank you for clarifying that. The important part for me is removing the sensitive information. Not at all married to the specific format or having the requests URL there at all, sorry I didn't make that clearer before. You are absolutely right that clarity and correctness are important.

Do we all agree with requests/<version> as the UA string, then?

I can submit a PR for this, if you wish, as soon as we agree on the exact format.

Lukasa commented 8 years ago

That suits me =)

Lukasa commented 8 years ago

Resolved. =)