pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

pip should support custom authentication handlers for private pypi #4475

Open zmt opened 7 years ago

zmt commented 7 years ago

* any OS, really

Description:

This is a feature request.

It would be super-awesome++ if pip supported custom authentication handler configuration so private pypi repositories are not restricted to http basic auth only. Basically, make MultiDomainBasicAuth the default and no longer the ONLY option in a PipSession as it is today: https://github.com/pypa/pip/blob/9.0.1/pip/download.py#L331-L332

This limitation prevents easy integration with stronger authentication (e.g. 2-way TLS, 2FA, etc.) and SSO schemes at enterprises with private pypi repositories. The lack of support makes basic auth credential distribution and leaking unnecessarily difficult problems to address and combat.

zooba commented 6 years ago

@pradyunsg @dstufft I'm keen to implement this (mainly so we can support other auth schemes with Azure Artifacts when the Python support goes public).

Will you be at the sprints at Bloomberg in a couple of weeks?

pradyunsg commented 6 years ago

I will be, in London.

zooba commented 6 years ago

Okay, good to know.

Our main hope is to support git credential helpers (see https://git-scm.com/docs/api-credentials and https://git-scm.com/docs/git-credential), in no small part because we already have a compatible credential helper that will work for us :)

I'm not sure how orthogonal that is to enabling the auth class to be overridden, but we're willing to contribute both. Just want to get some guidance on how many places you think this ought be touching, or any preferences as to how it's added.

zooba commented 6 years ago

Just posted the same proposal for twine, as it didn't appear to be there. Hopefully we can design something that works similarly for both.

Rather than going directly to the external process credential helper, having a (common) Python interface would be just as easy. Then perhaps we can publish our authorization tool in a package and installing that could enable auth for certain URLs automatically? That may satisfy all the needs here, without having to support any particular external helpers in pip/twine themselves.

webknjaz commented 6 years ago

@zooba do you mean publishing a dist which registers itself within a certain entrypoint pip would recognize?

zooba commented 6 years ago

@webknjaz That's what I'm thinking. An entrypoint or something similar would be fine, as that way pip install <cred-helper> could automatically light up without the user also having to also add more command line arguments.

zooba commented 6 years ago

Okay, looks like this is basically adding keyring support, so I filed #5948 (basically, when we can't handle the 401 response from pip's cache, go through keyring first before prompting the user).

I believe that will also work for the original use cases? Keyring has extensible backends, so it would mean installing another package that includes keyring before installing the package. @zmt - thoughts?

zmt commented 5 years ago

I believe that will also work for the original use cases? Keyring has extensible backends, so it would mean installing another package that includes keyring before installing the package. @zmt - thoughts?

I haven't thought about this really since 2017. The addition of keyring support is great, but doesn't appear to help with SSO or using ssh certificates from an ssh-agent, which were 2 authentication methods I initially had in mind back then.

schlamar commented 4 years ago

Wouldn't it be the best solution if pip just allows to provide a custom requests.auth.AuthBase? There are already a few useful auth implementations for requests like OpenId and Kerberos, see https://requests.readthedocs.io/en/master/user/authentication/

fedorbirjukov commented 4 years ago

Is anyone already working on a solution?

uranusjr commented 4 years ago

@fedorbirjukov If anyone is actually seeking a solution, they are not doing it publicly :) Go head and work on it!

I don’t think it’s a good idea for pip to provide direct access to requests.auth.AuthBase; pip using Requests usage should be treated as an implementation detail. An intermidiate abstraction would be needed.

fedorbirjukov commented 4 years ago

I created PR #8029 based on PR #3731. Fingers crossed. UPDATE: closed it after having a closer look.

amancevice commented 4 years ago

I also opened a PR #8030 that is related to this.

My change adds a --extra-headers option to pip commands that enhances the PipSession object with arbitrary headers so you can do things like token-based authentication.

E.g.:

pip install \
  --extra-headers='{"Authorization": "..."}' \
  --index-url https://secure.pypi.example.com/simple \
  --trusted-host secure.pypi.example.com \
  fizz==1.2.3
uranusjr commented 4 years ago

I’ve cleaned up the previous comments a bit to focus this thread on the remaining this at hand: implementing a way to plug in custom authentication backends, to support using methods such as Kerberos (#6708) and Windows Integrated Authentication (#8163).

The solution will likely be some kind of a plug-in system, so a user can install a backend alongside with pip, and use a flag to tell pip to use that. So the next questions from what I can tell would be to a) come up with a design, and b) identify places that need to be pluggable. I’m marking this as deferred till PR since some actual code would likely be the easiest way to kick off the discussion.

ghost commented 4 years ago

I honestly think pip should look to git-remote-helper as a model for a possible solution here. Example usage could simply be something like this:

$ pip install my-private-package --extra-index-url s3://my_private_pypi_bucket/

When the "scheme" of the repository URL (s3 in this case) is unknown to pip, it tries to start a subprocess named something like pip-remote-s3, whose executable would be located on the PATH due to the installation of some 3rd party helper. It then sends "commands" to the subprocess via stdin, much like git-remote-helper.

You could allow others to implement whatever custom auth mechanisms they like via one of these helpers, and users need to simply install said helper onto the PATH, then use the helper's corresponding scheme in the index URL. To be honest this isn't even custom authentication support per se, but more custom protocol support which would allow whatever authentication mechanism you'd like. pip install via SFTP? No problem!

I don't know exactly what the protocol between pip and the helper would look like, or what layer of abstraction it should lie on. Should the helper simply send PEP 503-style responses to stdout? Should we allow the helper to ask input from the user directly during pip commands? Should CLI options be passed from the pip command (something like --<scheme>-helper-options), or should we limit helper configuration to its own devices, config files and the like? Just some thoughts, would like to discuss.

If we choose to go down this path I'd be happy to have a stab at a PR for it. I'm not familiar with pip's internals but I'd like to get involved.

fedorbirjukov commented 4 years ago

@tharradine Good point. I've never used git-remote-helper, at least consciously. But its model seems to allow integrating completely different technologies.

I used git on Windows though. And Git has out-of-the-box Windows support, called schannel (Secure Channel). And that's what I'd like pip to have, too. But pip devs are reluctant to go down that road.

di commented 4 years ago

The twine project has a similar feature request: https://github.com/pypa/twine/issues/362

uranusjr commented 4 years ago

I wonder if this is a good candidate for a fundable packaging project. Both pip and twine use requests internally, so it might be a good idea to build an entrypoints-based plugin system that can be used by both. I expect corporations would be the main users as well, so it makes sense to ask them for resources.

schlamar commented 4 years ago

As already mentioned above (maybe too vague), requests already supports custom authentication handlers so you don't need some complicated process communication protocol: https://requests.readthedocs.io/en/master/user/authentication/

So in theory the user just have to configure a factory creating such an authentication (for example an auth.py file in the pip config folder returning a requests_ntlm.HttpNtlmAuth). Pip creates an instance and passes it to requests.

That would be a really simple solution and has the benefit, that existing requests auth handlers can be used without modification.

uranusjr commented 4 years ago

We can theorise all day, but ultimately someone still needs to put in time and effort to write the code. Which is where funding comes into play.

schlamar commented 4 years ago

I would expect that organizing funding for my proposal would take more time than implementing the solution...

ghost commented 4 years ago

We can theorise all day

That's kind of the point of these issues is it not? Funding is not a prerequisite to discussing design ideas, it is not even a prerequisite to an implementation - I've offered my time in a previous comment

schlamar commented 4 years ago

If someone is willing to help with the configuration part in pip I can make a PoC.

I would propose something like PIP_AUTH_FACTORY/--auth-factory which should point to a Python file. This Python file has an auth function (or other callable) returning an requests.auth.AuthBase.

For example:

from requests_ntlm import HttpNtlmAuth

def auth():
    return HttpNtlmAuth('domain\\username', 'password') 
ghost commented 4 years ago

@schlamar I agree that a requests auth handler is a simple solution to the use case of authenticating to a PEP 503 repository over HTTPS. For many users I'm sure that is all they need.

Unfortunately I'm a bit more ambitious and would like a plugin system to not require the use of any specific transport or application protocol, not require the package repository to adhere to PEP 503.

Expanding on my S3 example above - I could have a simple repository being hosted simply on an S3 bucket - no custom HTTP endpoints whatsoever, no HTML files, all that's required is some pip-remote-s3 client-side script, which knows how to discover the dists. The subprocess communication protocol need not be "complicated" - in fact it can be even simpler than PEP 503's "Simple Repository API".

schlamar commented 4 years ago

@tharradine I see. However, I think this should be discussed in a separate issue (support for custom protocols instead of custom authentication handlers).

ghost commented 4 years ago

@schlamar That's fair enough, I suppose the two concepts are not mutually exclusive and both solutions could well be accepted.

pfmoore commented 4 years ago

Things I'd want to see in any concrete proposal to handle this:

  1. A means whereby it's user-expandable, so that tools like pip don't need to add new code every time someone comes up with a new protocol/handler/whatever.
  2. A way of addressing the bootstrapping issue (user can't install the handler because they need pip to do so, and pip can't install without the handler).
  3. A reusable solution that will work across PyPA tools, so we can avoid having to implement the same feature (possibly with annoying subtle differences) in pip and twine and ...
  4. A clarification of how this fits with the fact that pip has no supported programming API, so any sort of plugin cannot rely on anything about pip's internals remaining constant. (As a practical example, what if we decided to switch from requests to httpx for our network protocol? It's not impossible that we would do this...)
  5. Good documentation and tests for all of the above.

Reasons I think these are important:

  1. These same points come up every time we discuss issues like this. For example, the bootstrapping issue came up with the keyring implementation, and wasn't completely addressed there, so that feature is less useful to some people than it might otherwise be. Let's not repeat that.
  2. Design issues like this are much harder than "just writing the code", and result in maintenance issues longer term if we just accept a PR without considering them.
  3. The interactions between new features for pip and existing features have the potential to become very complex very quickly, and generally when a PR is developed with a focus on just addressing the initial use-case, these interactions are not noticed until after the PR has landed (and often, not until people have started relying on details of the interactions which weren't ever intended). Again, that can be a maintenance issue, making refactoring of pip's code base way harder than we can deal with.
  4. Test infrastructure for this sort of environment generally doesn't exist in open source CI offerings, so it's really hard to ensure adequate testing.

It's really hard to thrash out this sort of "wider issue" in the context of an open source issue tracker/pull request workflow. That's where a funded project, with a clear scope and a remit to look at the broad implications, is a potential way forward for proposals like this. And where the use case is specifically around "corporate" infrastructure like private repositories, some sort of funding can help bridge the gap between volunteer resources who have no "itch to scratch" in this area, and businesses that depend on such support but don't otherwise have a means to influence what features get accepted.

Remember, the pip developer team consists of a very small number of wholly volunteer contributors. We're working on trying to make things more sustainable, but in the meantime we have to be careful how we manage feature additions. Funded developments is one way we're exploring of doing this.

(And yes, I understand that the above makes something that "seems simple" into quite a big project. I don't apologise for that - changes to pip can have a huge impact, and we owe it to all of our users to do our best to ensure they are well managed).

pradyunsg commented 3 years ago

I imagine most of the folks interested in this are operating in a corporate setting, with infrastructure set up for running an internal PyPI.

That's a good audience to point to the fact that the PSF's Packaging WG has this listed as a fundable project: https://github.com/psf/fundable-packaging-improvements/blob/master/FUNDABLES.md#architecture-to-support-alternative-authentication-methods-in-packaging-tools

Please contact the Packaging WG by emailing packaging-wg@python.org to ask us to estimate how much one of these improvements would cost; we'll get back to you within a few business days.

jpedrick commented 1 year ago

I made an attempt at resolving this with minimal changes to pip itself: https://github.com/pypa/pip/commit/0205e2e7a18156972ca975baa404a01387123895

@pfmoore I'd love your feedback as to whether you think this would resolve the requirements you listed here.

My hope for this is that users would be able to supply completely custom authentication headers for AWS S3 or, say, Kerberos authentication over HTTP. All the implementation details would be up to the auth override module developer.

The basic assumption in my initial implementation about pip internals is that there will be a module with an "AuthBase" class to implement. This isn't strictly necessary, as it would also be possible to define class with __call__ "hook" supplied to MultiDomainBasicAuth which gets a first look at the request URL and returns 'None' if it's uninterested in the URL.