pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.53k stars 3.03k forks source link

pip should not execute arbitrary code from the Internet #425

Closed glyph closed 11 years ago

glyph commented 12 years ago

When you 'pip install' something, it fetches the code from the internet, and then executes it. If you follow the advice of many projects and 'sudo pip install' something, pip executes that code from the internet as root.

pip does not do TLS certificate verification, nor does it do package signature verification, nor does it even do DNSSEC. There is no assurance whatsoever that the code being installed came from the intended source. The archetypical hipster hacker doing a 'pip install django' over some cafe's wifi will be pwned within seconds if the DNS for pypi.python.org happens to be spoofed.

I believe that this might be addressed by https://github.com/pypa/pip/pull/402 but that deals with a bunch of other issues as well, and I felt there should be a report somewhere about this somewhat well-known deficiency in pip's download and update procedures.

carljm commented 12 years ago

Indeed, thanks for the concise summary. In addition to TLS cert verification and package signature verification, we should also have an option to forbid downloading any off-PyPI sdist that isn't served by HTTPS.

glyph commented 12 years ago

Ultimately, package signature verification is the main thing. If the bytes are properly signed and authenticated, the transport can be any old insecure crud and it shouldn't matter: the software being executed is the right software regardless of how it got there.

kumar303 commented 12 years ago

Using requests + certifi would be an easy way to add proper cert checking.

qwcode commented 12 years ago

if pip gets support for "wheel" (see this fork: https://github.com/qwcode/pip), we'd be doing this for wheels at some pt at least, since the wheel spec provides for it, but @dholth can speak to that better than me.

wheel docs: http://wheel.readthedocs.org/en/latest/

ioerror commented 12 years ago

The main reason not to rely on package signatures alone is that old signatures can be replayed. Defense in depth seems to be a reasonable idea when it comes to installing and updating code.

A rather good overview of the entire nightmare was written by Cappos et al:

https://www.updateframework.com/

wyuenho commented 11 years ago

Any progress on this? I'd really like to see this issue be given the highest priority given the recent attack on Rubygems.org. Package authors aren't going to sign their packages unless they know the installer supports it.

kirubakaran commented 11 years ago

+1

beaumartinez commented 11 years ago

+1

byrongibson commented 11 years ago

+1

reidrac commented 11 years ago

+1

PaulMcMillan commented 11 years ago

This is a little easier today than it was a year ago when we last talked about it. Pip no longer supports python 2.4, which caused much trouble. Python 2.5's SSL support is stoneage at best, but most users are on 2.6 or better. If pip included the backported hostname checking code from 3.2 (http://pypi.python.org/pypi/backports.ssl_match_hostname/3.2a3) and only validated certificates on python 2.6 and newer (the same way mercurial does), this might be possible with a relatively small patch.

kumar303 commented 11 years ago

with that code, where does the CA bundle come from? Wouldn't pip also need something like certifi? It looks like you can't get root certs out of the box on 2.6+.

qwcode commented 11 years ago

Hello, I'm one of the pip maintainers. I don't claim to have the security expertise to lead this effort, but i'm certainly interested in helping anyone who's willing to attempt pull requests when it comes to the basics of code placement and writing tests.

zyga commented 11 years ago

Hi.

I'm writing a small subsystem that can be plugged into pip (but also into any other tools, including ruby word), that manages trust to stuff downloaded from the Internet. Ping me on twitter @zygoon, here on github (zyga) or irc (again zyga) if you are interested in helping out.

westurner commented 11 years ago

:+1:

X-posted from http://www.reddit.com/r/Python/comments/17rfh7/warning_dont_use_pip_in_an_untrusted_network_a/c8923kp

Here are some links which may help in developing a solution to this vulnerability:

reidrac commented 11 years ago

distutils support package signing with GPG: http://docs.python.org/2/distutils/uploading.html

It creates a PACKAGE.asc file that pip could potentially download and verify with gpg (adding a flag to pip, not by default). It won't solve the key management problem, but at least if you're interested you can get the gpg key of the developer(s) and add them to your keyring so the signature can be verified.

That could be a good start.

PyPI should then encourage packagers to sign the packages (may be including a "how to" for gpg newbies; see create key, make backup, create a revocation cert, make backup, potentially export the key to a keyserver, etc).

Potentially it would be a good recommendation that the author_email from setup.py matches the gpg key email, so it can be checked by pip.

zyga commented 11 years ago

@reidrac That is insufficient, for all it does it allows anyone to do a MITM attack by repackaging any software as "Joe User" that has a valid GPG signature (for that user).

zyga commented 11 years ago

I've started working on a tool that could be integrated with pip (and other tools) to verify downloaded software. It does not require SSL or any trusted networking of any kind. Have a look and help me design and implement it: https://github.com/zyga/distrust

dholth commented 11 years ago

With digital signatures you would probably want a system that trusts the signing key per-package. For example, I would accept the Django publisher's key for Django but not for Turbogears.

On Mon, Feb 4, 2013, at 08:37 AM, Zygmunt Krynicki wrote:

[1]@reidrac That is insufficient, for all it does it allows anyone to do a MITM attack by repackaging any software as "Joe User" that has a valid GPG signature (for that user).

Reply to this email directly or [2]view it on GitHub.

References

  1. https://github.com/reidrac
  2. https://github.com/pypa/pip/issues/425#issuecomment-13077081
zyga commented 11 years ago

@dholth yes, this is exactly what distrust aims to implement

reidrac commented 11 years ago

@zyga There's no code in your repo, but as spec it looks interesting. Looks like a good answer for pip signature verification.

zyga commented 11 years ago

Code is coming this evening, I'm still working on it and I'm busy doing my regular job stuff ATM

dholth commented 11 years ago

The most engineered [Python] update security system is probably https://www.updateframework.com/ . It has a lot of interesting ideas, most importantly the ability to survive certain types of key compromises.

jsullivanlive commented 11 years ago

+1 last PyCon (or the one before?) a speaker was going to show us how to intercept the pip communication via injecting a packet before pip could respond. I love the idea that all my pip packages are signed so I can use 3rd party repos or mirrors without worrying.

dholth commented 11 years ago

@zyga please read about:

SDSI/SPKI: http://crypto.stackexchange.com/questions/790/need-an-introduction-to-spki-or-spki-for-dummies

Wheel signatures: http://www.python.org/dev/peps/pep-0427/#signed-wheel-files (the wheel repository at https://bitbucket.org/dholth/wheel/src/e783bb5d75fe392294e018b405a40b788fa69d5d/wheel/signatures/keys.py?at=default has a mechanism for keeping track of key - package trust). Wheel uses JSON web signatures which are very easy to implement.

http://tack.io

http://convergence.io

Did you know you can use the ssh-agent to do public key signing and verification?

zyga commented 11 years ago

@dholth I read all of that quickly but I don't know which part of that I should find interesting. Correct me if I skipped something essential.

Wheel signatures are good but they are in no way improving over the existing signatures for source tarballs. Note that I'm not implementing a crypto system or a certificate authority replacement as that is all not really solving the problem for software distribution (so what that code is signed if anyone can sign it).

As for all the other things, how are they going to improve the situation? Code signing in itself is not useful for anything as anyone can sign everything. The idea I proposed builds a thin layer of trust semantics on top of the existing GPG system. Do you think I could reuse any of the tools you've mentioned to implement that faster/better/more correct?

zyga commented 11 years ago

The wheel command is pretty much identical to what I've proposed but weaker as 1) It cannot take advantage of existing GPG identity network 2) has no support for improving trust to unsigned files.

It's still interesting though as other ideas seem to match exactly to what I wrote

radiosilence commented 11 years ago

Signed packages and using verified SSL by default are two separate issues. The former is more difficult to do (every developer has to sign their packages), whereas the latter, I'm honestly shocked doesn't happen. Even a simple one line fix of changing the default index to https://pypi.python.org/simple/ would go some way, but verifying SSL certificates is a must.

jsullivanlive commented 11 years ago

Are there any all-python solutions for signing? That may make it more likely to work cross-platform without a lot of overhead (/me looks at windows).

pnasrat commented 11 years ago

Even if we used certifi the cert is a cacert one, which IIRC is not in the Mozilla bundle

https://bugzilla.mozilla.org/show_bug.cgi?id=215243#c158

subject=/CN=pypi.python.org
issuer=/O=CAcert Inc./OU=http://www.CAcert.org/CN=CAcert Class 3 Root
radiosilence commented 11 years ago

Ok, I've implemented CA verification for SSL in pip, however I'm not doing a pull request yet because the tests don't pass and I'm not quite sure why.

Here's my fork, could someone please help me figure out what's wrong? https://github.com/radiosilence/pip

pnasrat commented 11 years ago

Do you have a gist of the failures or a link to a travis build?

jezdez commented 11 years ago

Hi all,

the biggest problem here is to support SSL verification in Python 2.5, which isn't possible AFAIK due to the lack of a ssl module. There is http://pypi.python.org/pypi/backports.ssl_match_hostname/3.2a3 and the ssl module that we could use, but of course that would make shipping pip much harder (requiring a compiler).

I'd be interested to hear what ideas you have for supporting 2.5. Leaving user on that platform unsecured is a bad idea™. Especially in case PyPI starts to be SSL only in the future.

jezdez commented 11 years ago

@radiosilence This looks pretty great already but is missing the hostname verification as done in 3.2. Getting it from http://pypi.python.org/pypi/backports.ssl_match_hostname/ or Python (license permitting) would be a sensible step.

merwok commented 11 years ago

jsullivanlive: I think I saw a link to that pass on Twitter or email today, I’ll dig it up and post it here.

wyuenho commented 11 years ago

@jezdez If you want to use the ssl module on 2.5 you'll need to have a compiler to compile the c-extension anyway. Also, by the time PyPI starts being SSL only, Python 2.5 support will likely to have been dropped.

kumar303 commented 11 years ago

@jezdez not verifying certs on any platform seems worse than only verifying certs on Python 2.6+. Not ideal but way better than nothing.

jezdez commented 11 years ago

@wyuenho Actually looks like PyPI will soon get a "real" cert. ssl requiring a compiler is indeed a problem, so what if pip asks any 2.5 user to install the ssl dependency to enable certificate checks, showing annoying messages as long as it's not there? Could we ask the ssl package maintainer to provide precompiled eggs to make it easier?

@kumar303 I disagree, we can't announce that pip is "more secure" while it's actually only covering part of the user base. I admit there are some stupid requirements to get it right on 2.5, but not providing any option to enable ssl cert checks is ludicrous. Due to the bootstrapping problem, I admit we'd have to make it optional for now, see above.

reidrac commented 11 years ago

@jezdez How large is 2.5 user base?

Final Python 2.5 release was in Sept 2006 and it's unsupported by PSF. Some long time support Linux distributions package 2.5 (even older), but then these users rely on their OS vendor for security updates. In these cases I don't think the distributors package a recent pip version, and if users rely on upstream pip and don't use OS audited packages they have a problem anyway.

It doesn't feel right that 2.5 support in pip holds back a security enhancement that will improve the situation for the most part of pip users. The warning in 2.5 sounds great to me, and 2.6+ is good enough!

jezdez commented 11 years ago

@reidrac I don't know any numbers, it would be interesting to see stats, but I don't know where to get them from.

I think you also didn't get my message from above, I'm not suggesting to hold a security enhancement back for 2.5, I'm saying that we can't ignore 2.5 since it's a supported version by pip. Due to 2.5 having pretty awful support for ssl the only option is to ask the user (with a console message) to install the ssl package to enable ssl support. Don't think that's too much to ask.

wyuenho commented 11 years ago

@jezdez or conditionally add it into install_requires for py2.5 systems in setup.py.

jezdez commented 11 years ago

@wyuenho I wish we could do that, unfortunately it requires a compiler during installation and there are no binary dists of the package.

wyuenho commented 11 years ago

@jezdez It's not just about you, you have forgotten the people who also happen to have a compiler on a 2.5 system and install every python package from PyPI instead of apt or yum. There's no reason we can't do both. We can also lobby for the inclusion of the OS python packages for the ssl backport for those old long term support distros versions. Since enabling SSL on PyPI is a pretty big deal for the Python community, maybe we should move the maintenance of this package back to the PSF? The PSF can maintain a list of binary eggs there.

qwcode commented 11 years ago

fyi, the active pull for the cert checking is #791

isislovecruft commented 11 years ago

Hello everyone.

Normally, I aim to avoid blatant self-promotion, but since this issue has been been making my work much more difficult for quite some time now, I think that perhaps I should speak up.

I read on David Fischer's blog that there is the possibility that pip will be able to check GnuPG signatures on packages, possible using python-gnupg. I see that @zyga has done a fair amount of thinking about trust paths and signature verification (more on this in a minute), and though there is no code yet, I should warn that the above linked version of python-gnupg suffers from a vulnerability due to raw user input, including unescaped shell metacharacters (CWE-78), being passed directly to a call to subprocess.Popen([...], shell=True). The maintainer was unresponsive and several of my projects depended on a Python library for GnuPG, so I forked the project and implemented strict input sanitisation. Notes from my audit are in that repo as well, if anyone needs more details.

@zyga I'm really glad to see that someone is thinking about how to get package signature verification into pip. Though, as @dholth was presumedly trying to point out with the above links on SPKI, your scheme in your distrust project project is a giant step in the right direction, though, unfortunately, it only goes halfway.

As sad as it is -- and as incomprehensible as it may be to the rest of us -- "normal" people are not going to sit there and manually type out commands to compute trust paths. If they were willing to do this, they would already be in the PGP WoT. Users aren't willing to do this to verify that the people/parties they communicate with are who they claim to be, and they are even less willing to do it to verify that those people/parties have some Attribute X which allow Event Y. From your documentation on distrust:

Now Alice wants to install a program that her friend Bob wrote, "useful-tool". Alice already has his public key in her GPG keyring so that part of the problem is out of the way.

No. Sorry, that problem is not out of the way. And it won't ever be. Alice doesn't trust me and she shouldn't; she's never met even met me.

So, Simplified Public Key Infrastrucure. It's essentially what the damned CA system is. Distributed trust doesn't mean that you have my GPG key, and then you trust my software if my key has signed it. It means you trust some other entity to do these calculations for you.

For example, I want to upload a module to PyPI. Let's pretend I am creating a new account. PyPI should say, "hey, we're going to send you a verification email. By the way, can you reply to it, and sign your reply with a key which include this email address in one of its UIDs?" I do that, and now PyPI knows that I have control of the private key. PyPI then certifies my corresponding public key, and publishes that certification somewhere findable. Next, I go to upload my module, i.e. python setup.py register. PyPI should then, by some mechanism, create an attribute which says that my key is the only one allowed to make signatures for that package. There are already mechanisms in GnuPG for doing this, look at the --cert-notation option in GnuPG. I.e. PyPI does the equivalent of $ gpg --cert-notation %f@pypi.python.org=<insert_package_name> --sign-key <maintainers_key>.

Eventually, what you really want is something similar to what Moxie does with Convergence, namely distributed trust and trust agility. The default for pip could be to trust key certifications made by PyPI (and this would likely make the most sense for most users), but what if I wanted to use pip, and only trust key certification made by torproject.org?

However, requiring certification based on network vantage points, as is the default in Convergenge, is difficult to implement on top of the PGP WoT, because very few keys/keyholders use the GPG PKA/IPGP features (see page 9, RFC 4398 and these blog posts), nor do very many keys (to my knowledge and experience) use the OpenPGP Preferred Keyserver URL subpacket in self-signatures, so using Preferred Keyserver subpackets to indicate a common consensus between third party key certifiers would likely be non-trivial.

dstufft commented 11 years ago

Hey there!

So to respond to the first bit, pip could not force TLS everything due to backwards compataibily concerns. The upcoming (in a few days most likely) pip 1.4 includes warnings for whenever you install something not hosted on PyPI (even if the package is directly linked from PyPI and includes a md5 hash) and it also includes warnings whenever you install something that requires hitting a non PyPI page to discover that version. In addition to that in pip 1.5 it will default to NOT installing nor hitting those external urls unless explicitly requested by using --allow-external NAME and --allow-insecure NAME respectively. The pip 1.5 behavior can be gotten in 1.4 by using the --no-allow-external and --no-allow-insecure flags.

Now to go on to the rest of your comment. Package signing would be nice to have sure. However everything you've just asked for is already essentially there. If you want to trust PyPI you simply use PyPI. It is served with TLS and you can be assured that the package you downloaded is the package PyPI thinks it has. If you don't trust PyPI maybe you trust someone else instead, they can easily create their own mirror served via TLS and you can then use them by simplying switching your index url. I've been a big pusher of keeping all dependency data "abstract" (names and versions, not urls) in packaging metadata to allow just this sort of thing.

I, personally, also hem and haw about the viability of using gpg for this task. IANAL but I feel as if the GPL license has too much potential for forcing GPL onto our toolchains.

That's not to say we'll never get end to end signing, and a convergence like model is likely a good one, however there are many problems with the security of the packaging infrastructure and with the changes in pip 1.3 (TLS verification) and the changes in 1.4/1.5 (external url becoming opt-out and then opt-in) we've gotten to a point where if you trust the authors, and the repository you're installing from you can be reasonably assured that what you're downloading is safe and what the author uploaded.

dholth commented 11 years ago

Wheel has an experimental signatures system that as its primary goal avoids GPG like the plague. You would share raw 512-bit public Ed25519 keys to establish trust.

I suppose you are aware of [1]http://theupdateframework.com/

On Mon, Jul 8, 2013, at 10:49 AM, Donald Stufft wrote:

Hey there!

So to respond to the first bit, pip could not force TLS everything due to backwards compataibily concerns. The upcoming (in a few days most likely) pip 1.4 includes warnings for whenever you install something not hosted on PyPI (even if the package is directly linked from PyPI and includes a md5 hash) and it also includes warnings whenever you install something that requires hitting a non PyPI page to discover that version. In addition to that in pip 1.5 it will default to NOT installing nor hitting those external urls unless explicitly requested by using --allow-external NAME and --allow-insecure NAME respectively. The pip 1.5 behavior can be gotten in 1.4 by using the --no-allow-external and --no-allow-insecure flags.

Now to go on to the rest of your comment. Package signing would be nice to have sure. However everything you've just asked for is already essentially there. If you want to trust PyPI you simply use PyPI. It is served with TLS and you can be assured that the package you downloaded is the package PyPI thinks it has. If you don't trust PyPI maybe you trust someone else instead, they can easily create their own mirror served via TLS and you can then use them by simplying switching your index url. I've been a big pusher of keeping all dependency data "abstract" (names and versions, not urls) in packaging metadata to allow just this sort of thing.

I, personally, also hem and haw about the viability of using gpg for this task. IANAL but I feel as if the GPL license has too much potential for forcing GPL onto our toolchains.

That's not to say we'll never get end to end signing, and a convergence like model is likely a good one, however there are many problems with the security of the packaging infrastructure and with the changes in pip 1.3 (TLS verification) and the changes in 1.4/1.5 (external url becoming opt-out and then opt-in) we've gotten to a point where if you trust the authors, and the repository you're installing from you can be reasonably assured that what you're downloading is safe and what the author uploaded.

Reply to this email directly or [2]view it on GitHub. [jRB-KP9-4apGOUr1-hoPd0jl_QeaJV1kbAixIHEVVu5jBrcube9Tsxk6-lfePdmp.gif]

References

  1. http://theupdateframework.com/
  2. https://github.com/pypa/pip/issues/425#issuecomment-20610513
nejucomo commented 11 years ago

My impression is that this ticket is too vaguely specified so that comments will grow endlessly, but the ticket will never be closed, or if it is, some people will complain that it should not be.

So:

I propose we replace this ticket with different tickets which are more specific and distinct, such as those for TLS verification, and those related to package signatures.

If you think of more specific issues, please create more specific tickets and cross-link them here. I'm just a pip fan, not a core part of the community, so if anyone prefers not following my suggestion, please speak up.

I'll start the ball rolling with: ticket #1035 with a package signature verification "hook" that could allow people to experiment and users to choose and opt-in to their preferred scheme.

isislovecruft commented 11 years ago

@dstufft Awesome. This is all really good to hear! Especially that 1.4 will allow opting out of pip's URL crawling via a flag, because then it can be forcefully enabled per project via requirements.txt. I was about one more email away from sending my bug report and POC to cve-assign@mitre.org for pip's crawling behaviour. If this is still helpful to push distros to update their packages, I can still do so. And, of course, I can give you/the PyPI team my audit notes and discussions of the current/recent CVEs. I am really glad that you all are working on more secure update mechanisms. Thanks a lot! :)

Though, I must point out -- as others have on this ticket -- that the following are separate issues:

The first is happily completed. I have yet to re-audit it to confirm.

The second...well, pip and PyPI both default to md5. In PyPI (please correct me if I missed something!), there doesn't seem to be a way to give SHA256 or similar-grade hash digests to package URLs. I was glad to find that there is now a "links" interface in PyPI for maintainers to control what URLs pip downloads from, though if there is a way to specify an alternative hash digest which is actually checked on the client end, I missed it.

The brief version of why you should not use MD5:

And on to the short version of why you should not switch to SHA1:

For the third issue, code signature verification, there should be a way to specify which developer is allowed to sign which packages. see my above post on this ticket.

@nejucomo I am happy to split my responses into multiple tickets, as the pypa devs see fit. I'm not a contributor either, and if there is some pattern to project management I haven't been able to decipher it. Don't wanna mess with their flow. :)

References:

  1. den Boer, B., & Bosselaers, A. (1994, January). Collisions for the compression function of MD5. In Advances in Cryptology—EUROCRYPT’93 (pp. 293-304). Springer Berlin Heidelberg. http://www.cosic.esat.kuleuven.be/publications/article-143.pdf
  2. Manuel, S. (2011). Classification and generation of disturbance vectors for collision attacks against SHA-1. Designs, Codes and Cryptography, 59(1-3), 247-263. http://eprint.iacr.org/2008/469.pdf
  3. Black, J., Cochran, M., & Highland, T. (2006, January). A study of the MD5 attacks: Insights and improvements. In Fast Software Encryption (pp. 262-277). Springer Berlin Heidelberg. http://www.iacr.org/archive/fse2006/40470265/40470265.pdf
  4. Sotirov, A., Stevens, M., Appelbaum, J., Lenstra, A., Molnar, D., Osvik, D. A., & de Weger, B. (2008). MD5 considered harmful today. Creating a roque CA certificate. http://www.win.tue.nl/hashclash/rogue-ca/
  5. Xie, T., & Feng, D. (2010). Construct MD5 Collisions Using Just A Single Block Of Message. IACR Cryptology ePrint Archive, 2010, 643.
  6. http://marc-stevens.nl/research/md5-1block-collision/
  7. X. Wang, Y.L. Yin, and H. Yu. Finding Collisions in the Full SHA-1. In V. Shoup, editor, Advances in Cryptology – CRYPTO 2005, volume 3621 of Lecture Notes in Computer Science, pages 17–36. Springer-Verlag, 2005. http://f3.tiera.ru/2/Cs_Computer%20science/CsLn_Lecture%20notes/Advances%20in%20Cryptology%20-%20EUROCRYPT%202005,%2024%20conf.(LNCS3494,%20Springer,%202005)(ISBN%203540259104)(588s).pdf#page=48
  8. Manuel, S. (2011). Classification and generation of disturbance vectors for collision attacks against SHA-1. Designs, Codes and Cryptography, 59(1-3), 247-263. http://eprint.iacr.org/2008/469.pdf
dstufft commented 11 years ago

If you want to fire off a CVE for it I'll gladly include it in the release notes. I was going to figure out how to do it myself to be honest but I don't care how it happens :)

As far as Hashes go, pip itself doesn't default to anything. It can use any hashing algorithm supplied by an url that is guarenteed to be in hashlib (notably this is md5, sha1, and any of the sha-2's). I added this I think in 1.2? Maybe 1.3 so that I could use sha256 hashes on Crate.io.

The md5's come from PyPI itself and currently they are still md5's because of setuptools/easy_install which only support md5. As far as I'm aware it is not currently feasible to generate a second preimage attack against md5 (if you know of an attack that allows this please please tell me so I can use it to convince people we need to switch). This is another thing on my list of things I want to fix but I have currently put it on the back burner due to there being no preimage attack on md5 that I was aware of.

As far as pip itself goes unless ultimately a different scheme than #hashfunc=hash is devised it's already prepared to handle other hashes, it just needs the index to give them to it. There could be an argument against pip allowing md5 at all but again until there's either a preimage attack or PyPI itself switches that is unlikely to gain much traction.

As far as package signatures go #1305 was recently opened and is probably the best place to talk about that currently. Again this was something on my list and was punted to deal with more pressing issues.

I should probably also mention that any changes to the hash function or package signatures will likely need to go through distutils-sig and go through the bikeshedding contained within.