Open venthur opened 7 years ago
Adding the word authentication so that this shows up in my searches. :)
Note that you can pass the --index-url
option containing the login/password via the PIP_INDEX_URL
environment variable.
Also wanted to point out that you can pass --extra-index-url
with PIP_EXTRA_INDEX_URL
and this will override the value set in requirements.txt
.
So you can have a bare URL for source control and then set the environment value with secrets in your CI/CD.
I would not recommend setting PIP_INDEX_URL with a password unless you know you are using -q
since otherwise the index url is logged including the password.
I would not recommend setting PIP_INDEX_URL with a password unless you know you are using
-q
since otherwise the index url is logged including the password.
Not in pip 18.1 or later.
Thank you for adding the expansion of environment variables in requirement files. However, I was wondering if environment variables could be implemented for Pip similar to Twine? With Twine (especially for CI) you just need to set TWINE_USERNAME and TWINE_PASSWORD as environment variables in the CI. Thus, there's no need to add the username and password to the repository URL's.
Just curious.
There's keyring support that's integrated and up for the next release -- #5948.
I'm going to submit a PR to accept taking credentials from env variables. to make it much simpler to integrate safely with CI servers. Credentials should not be specified as command line options in any way as they may easily be leaked in logs or seen in process listings.
Hello, still open...
The PR was closed, even if PIP_PASSWORD was a nice idea.
Feel free to propose a new one if you think it is a good idea.
Sorry, I have not the knowledge to implement this PR. Is it possible to resubmit that PR. It's quite strange having still now this strong security issue.
Wanted to chime in and voice support for this. It looks like the PR was closed, but maybe is still a viable option.
What would be the process for reviving it? Could someone else just open a new copy of the existing PR, or should we wait and see if @lhupfeldt can revive it?
You are welcome to reuse/reopen my PR. I just gave up originally because there was some resistens from core developers.
On Wed, 22 Jul 2020, 17:54 Tim Orme, notifications@github.com wrote:
Wanted to chime in and voice support for this. It looks like the PR was closed, but maybe is still a viable option.
What would be the process for reviving it? Could someone else just open a new copy of the existing PR, or should we wait and see if @lhupfeldt https://github.com/lhupfeldt can revive it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/4789#issuecomment-662534589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMCZSIVUVXEOAXGYE62H6LR44DSPANCNFSM4D7OXPEQ .
Hello, I think that the PR was fantastic and it works similar to twine. I do not like at all putting my credentials on the file system. Please resubmit again. I have not knowledge about this kind of procedures
You'll hit the same resistance again, I suspect. This is precisely the sort of issue that keyring support was intended to avoid - pip needing to implement multiple mechanisms for handling authentication, each for a particular (entirely valid) use case. If keyring support (or keyring itself) isn't sufficient for this use case, we should be improving them, not implementing an alternative mechanism.
I do not find any article about how to use keyring for "Python3 pip + virtualenv". Could you point me a link?
If keyring support (or keyring itself) isn't sufficient for this use case, we should be improving them, not implementing an alternative mechanism.
This is a reasonable request, but it also doesn't seem like keyring is designed for the problem of CI/CD or automated builds or however you want to think of the issue that env var auth is trying to solve. The fact that it needs a "headless linux" section seems like an indicator of this. Maybe I'm wrong. Either way, this comment from the PR seems to capture my situation nicely:
If I understand correctly you expect me to: 1) Install the python keyring package on my systems (without using pip) 2) Implement my own keyring backend and install that on my systems (without using pip) 3) Configure keyring to on all my systems to use my special backend.
That sounds like so much effort that I would rather munge my pip
config/requirements.txt to inject credentials on the fly during builds.
If specifying explicit environment variables that pip
will use for auth feels like a slippery slope, another option that seems to work for npm
would be to attempt to replace things that look like environment variables in configuration files. This would at least allow one to set index-url
with userinfo parts like https://${MY_USER}:${MY_PASS}@secretpypi.example.org
and inject the relevant variables in CI. I don't know the pip
code base at all or if that is feasible; just a thought that occurred to me. It also has the advantage of allowing the user to auth against multiple indexes if necessary by specifying different env vars for each.
it also doesn't seem like keyring is designed for the problem of CI/CD or automated builds or however you want to think of the issue that env var auth is trying to solve
Have you raised that with the keyring project? Honestly, that's all we're suggesting here, and we're getting a lot of pushback. Pip added support for keyring, in a good-faith attempt to handle the requests we were getting for a mechanism to store credentials outside of pip¹. We were led to understand that the submitted PR was a good solution to this issue, and we took it on trust that keyring did the job we'd been told that it did.
To date, no-one has demonstrated that keyring isn't up to the job. Certainly, we've had people say that keyring doesn't support their use case. But nor does pip - someone will need to write new code, and the idea of adding keyring support was to delegate handling this sort of use case to that project. Until we see a definite statement from them that they aren't interested in supporting the use cases being described here, there's not much pip should be doing (IMO). If keyring come back and say they don't want to support this use case, then pip needs to look at what to do - and in my view, I'd want to reconsider whether we should be looking for an alternative to keyring that does support our users - I still don't want pip to get into the business of credential management².
attempt to replace things that look like environment variables in configuration files
You can use environment variables in requirements files. Have you tried using that feature to see if it handles your use case?
¹ The approach of using keyring had the additional benefit of not requiring the pip developers to get into questions around what is a secure way of handling credentials - we could leave that to the experts maintaining the keyring project. ² Yes, environment variables seem like a simple enough solution, and safe enough. But I'm not an expert, so I'm not going to make that decision. That's basically the point here...
Thanks for your reply. I clearly only have a tiny fragment of the context here. I'll try to find an appropriate place to ask about keyring in CI. Also, I did not know that requirements files would expand environment variables. I had tried it within pip.conf
, which does not appear to. I can probably get the job done through that mechanism!
I appreciate the perspective of leaving credential management to experts and your efforts to keep things going in the right direction.
@pfmoore chiming in here a bit and don't want to speak too much for others, but one of the cases mentioned elsewhere is that you end up in a bit of catch-22 situation that can't be resolved without some hacking, unless pip itself supports this.
If you're in an CI/CD environment where you only have access to a private, password protected PyPI repo, then you are in an unfortunate situation where you can't even install keyring to begin authenticating to that repo, even if it does support it.
There are perhaps other ways to get keyring installed, but they end up being a bit messy. Maybe an alternative is to ship keyring with pip or something along those lines, but I'm not sure of the feasibility or impact of that.
In short though, the concern is that if pip doesn't support that auth, and the only way to support auth is to install an external package, then we end up stuck in the cases where the external package requires auth.
I think the "pushback" is because environment variables are a normal way of doing this and keyring is not. Several replies here have outlined the specific issues with adding this as a dependency. Given that, rather than requiring a "definite statement" from the keyring project (and who is going to obtain such a statement?), it would make more sense to explain how keyring is a good solution particularly for CI/CD, as it is basically an exception we would make from norm in order to use this tool. (i.e. this is the only similar tool that would use keyring...)
I still don't want pip to get into the business of credential management
Environment variables don't put you in the business of credential management. Something else is responsible for setting the environment variables; that's the point of environment variables.
Realistically the alternative here is not going to be keyring. An alternative is using PIP_INDEX_URL (an environment variable) with basic auth embedded in it. Which means you already take credentials in an environment variable, so the concern there is odd. And you already fixed the logging of the credentials in the URL several versions ago I think. The problem with this approach is simply that the entire URL now becomes a secret value, rather than just the credentials. I think this request is simply to split the URL from the credentials so that the URL could be hardcoded in a checkin without the credentials.
It sounds like environment expansion in requirements.txt is potentially superior.
@TimOrme
Maybe an alternative is to ship keyring with pip or something along those lines, but I'm not sure of the feasibility or impact of that.
You're right - bootstrapping keyring is an issue. But it was known (and acknowledged) when the feature was added, so all I can really say is that the original implementation saw that as an acceptable limitation. I don't personally have a good answer here.
Vendoring keyring is unfortunately not possible, because keyring depends on C extensions, and pip cannot vendor C extensions (because pip needs to be platform-neutral - there's a lot more background here, but that's the reality and it's not going to change, unfortunately).
@jasonstitt
and who is going to obtain such a statement?
Someone who needs this to work, surely? You seem to be assuming that it's up to the pip developers. Sorry, but it really isn't.
Which means you already take credentials in an environment variable, so the concern there is odd.
OK, I've no problem if you think my reluctance is odd. Feel free to take it as simply meaning that I won't do anything about this myself, if that helps.
It sounds like environment expansion in requirements.txt is potentially superior.
It does indeed sound like that is helpful for people in this situation. Which makes me wonder why no-one found that information. Is the section here in the documentation unclear? Is it hard to find? It may be that people have wasted time debating keyring, when if they'd found the existing feature they could have solved their problem much more easily - so if there's any improvement to the documentation that would have helped, it would be great if you could offer a suggestion (ideally as a PR, but even just an issue describing what you'd like to have seen would be good).
I guess that most people just put 'package>=version' (or == ...) in requirements.txt. I definitely would never have had the idea to look at the requirements.txt specification in order to pass credentials.
Maybe some reference to requirements.txt from the existing documentation about how to authenticate, and an explanation of what to put in requirements.txt to make pip read credentials from there would help?
But I hope you are not referring to embedding credentials in index URLs or even adding index URLs in requirements.txt? Credentials in URLs are generally considered insecure.
Adding URLs in requirements.txt for me would just make the file unreadable with even more substitutions to be made. We have production and not production pypi proxies, and I think other people will have the same. It would also mean that every requirements.txt would have to add the credentials vs just having to add them on the CI server.
I understand that you are trying to limit the maintenance burden of pip, but we are talking 10 lines of code including logging (excluding the test) (and that would be maybe 7 if the check that both password and username is set was removed, as suggested) and 5 lines of documentation.
And this feature seems to be in popular demand.
@pfmoore you've said (emphasis mine):
Pip added support for keyring, in a good-faith attempt to handle the requests we were getting for a mechanism to store credentials outside of pip¹.
And I think this is the misunderstanding in this discussion. We aren't asking for a mechanism to store credentials outside pip. We already have that one (e.g. the credential store in our CI server), and our mechanism, whatever it is, provides the credentials in the form of environment variables (which is very common). However, this mechanism is not keyring. The problem we face is then: how do we pass these credentials, that are already in environment variables, to pip?
The option to make pip to use keyring directly is very nice and solves a valid, but different problem, which is how to take credentials from keyring and pass them to pip.
This probably isn't to everyone's standard, but I do this to store credentials as environment variables.
@lhupfeldt will you be submitting a new PR for this? If not I can try to port your old PR changes to a new PR.
@pradyunsg It is currently impossible to use keyring
in a scenario with private pypi repository and CI environment, so it will be good to have pip
following twine, flit, and poetry in allowing use of environment variables for authentication.
Okay, I wanna split this into two segments. First off, trying to better understand people's usecase.
Is the usecase of folks commenting here:
I have credentials in a credential store that I want pip to use.
Use 👍🏽 to say yes, and 😕 to say no. If you say no, please upvote a follow up comment about what the usecase is (or drop a new comment, if nothing covers that usecase).
I have environment variables (e.g. from CI) containing credentials. pip should be able to grab those.
I have environment variables (e.g. from CI) containing credentials.
This isn't sufficient information to understand your usage pattern. Where do these credentials come from? Which/How many package indexes do you interact with in a single pip execution?
Twine and Flit operate on a single domain/package index at a time, and they can safely assume/bake in the assumption that using a single credential pair is sufficient.
Poetry uses a name for their package indexes, which allows them to namespace the environment variable: POETRY_HTTP_BASIC_{REPOSITORY_NAME}_PASSWORD
.
pip
has neither of these conviniences -- and no one has come up with a feasible approach to cover for the usage patterns that we know are possible, with credential management. Our keyring
integration solves that, by allowing users to use a credential store, have different credentials for different domains, and to enable them to tell pip to use that credential store directly via a keyring
plugin.
If someone can figure out a design pattern that addresses this need, while still working with multiple package indexes, I'd be very happy to get a PR for this. Basically, if someone have a corporate index and PyPI configured, it should be possible to tell pip via environment variables that it shouldn't use any credential pair for pypi.org
and what the correct credentials are for pypi.internal.the-organisation-that-pays-me-for-doing-this.com
(there's no need to educate me on the dependency confusion angle about this setup).
The option to make pip to use keyring directly is very nice and solves a valid, but different problem, which is how to take credentials from keyring and pass them to pip.
Alright, this statement is an oxymoron from my perspective.
The described usecase is exactly what keyring
(the Python package) solves. That package is exactly this bridge that you want to use environment variables for instead. It ships with support for the common system credential stores, and allows users to write third-party backends for whatever other credential stores they may wish to use -- https://github.com/jaraco/keyring#third-party-backends. This moves the complexity of "figure out how to get things from the credential store" out of pip, and into the keyring
package -- which is developed by folks that are not pip maintainers and have the expertise to design+evolve that solution.
I acknowledge that there's a bootstrapping concern, but that is a part of figuring out how to install pip within the constraints of your organisation's environment securely1 -- it's a fixed cost to be paid, and a consequence of the security model adopted.
1 You can use pip install --no-index {path-or-URL-to-an-accessible-wheel-for-keyring} {path-to-your-custom-credential-store-interacting-keyring-backend}
, in case you're wondering how to install a Python package offline.
That package is exactly this bridge that you want to use environment variables for instead.
To elaborate on this point - can people explain (in as much detail as seems necessary) why it's not possible for them to use/write/contribute a keyring backend that picks up credentials from environment variables (i.e., treating those environment variables as the "store").
Things that aren't valid problems (IMO):
I'm overall ambivalent on this -- this discussion has a weird mix of misrepresenting what pip's keyring integration does and never getting an update on what the underlying design constraints are -- it seems like a reasonable request but all the proposed solutions so far seem infeasible to me.
This issue never got a "proper" update on the current credential management story for pip after the keyring support got added, so... I guess I just posted that above.
I think the next steps here are:
keyring
library) isn't good-enough here.Vendoring keyring is unfortunately not possible, because keyring depends on C extensions, and pip cannot vendor C extensions (because pip needs to be platform-neutral - there's a lot more background here, but that's the reality and it's not going to change, unfortunately).
It also breaks the fundamental assumption -- keyring is a Python package with a programmatic API that allows users to import things from it to write a third-party backend. Vendoring it breaks that, eliminating the primary benefit of it -- externally maintained third-party backends for interacting with different credential stores.
Also... https://pip.pypa.io/en/stable/topics/authentication/ is a thing now, and I'll add a follow up issue to add a cross-reference to https://pip.pypa.io/en/stable/reference/requirements-file-format/#using-environment-variables there.
And, finally, please be mindful that pip is primarily maintained by volunteers.
This isn't sufficient information to understand your usage pattern.
Ok, I will give it a try.
Where do these credentials come from?
The credentials will be set on the current running (shell) environment. Hence normal environment variables. For example, you can fetch them with os.getenv('MY_PIP_USERNAME')
and os.getenv('MY_PIP_PASSWORD')
.
Which/How many package indexes do you interact with in a single pip execution?
Usually two, the default one and a private one.
The part I still fail to understand is why a separate environment variable is needed in the first place, since it is already possible to specify auth in PIP_INDEX_URL
and the like. The only use case I can come up with is when repository URLs are set in configuration files, and wants to supplement the auth part without writing that configuration again, which is already slightly weird, but still rather easily achievable with something like
PIP_INDEX_URL=$(pip config get global.index-url | sed "s/\/\//\0${PIP_USER}:${PIP_PASS}@/")
Note that this is more or less what we would do if the support is built into pip, there's nothing hacky about this—or rather, there's nothing magical about having this implemented in pip instead of an ad-hoc Bash one-liner.
So I think the bottom line is that we need more concrete, objective reasons to explain exactly why this is a needed feature, rather than subjective "I think pip should pick it up".
Usually two, the default one and a private one.
And you want pip to use the credentials to access both of them?
use/write/contribute a keyring backend that picks up credentials from environment variables (i.e., treating those environment variables as the "store").
It's got the same design constraints as pip's needs here, so... honestly, yea... that's quite possibly one of the better outcomes here -- it'll likely even work transparently with twine / flit / poetry etc if you do this right. :)
use/write/contribute a keyring backend that picks up credentials from environment variables (i.e., treating those environment variables as the "store").
That could be a solution to my use case, if I can work out the keyring bootstrapping problem for my CI environment. It shouldn't be too hard to write a backend doing this, after looking at some existing keyring backend implementations. Working out how to publish it on pypi.org seems like a bigger challenge :laughing:
Twine and Flit operate on a single domain/package index at a time, and they can safely assume/bake in the assumption that using a single credential pair is sufficient.
Poetry uses a name for their package indexes, which allows them to namespace the environment variable: POETRY_HTTPBASIC{REPOSITORY_NAME}_PASSWORD.
pip has neither of these conviniences -- and no one has come up with a feasible approach to cover for the usage patterns that we know are possible, with credential management. Our keyring integration solves that, by allowing users to use a credential store, have different credentials for different domains, and to enable them to tell pip to use that credential store directly via a keyring plugin.
@pradyunsg considering that an environment variables implementation for either pip or keyring would have the same issues regarding naming of environment variables, would this be sufficient to support multiple index urls? Is there any objection to using the PIP_
prefix?
PIP_INDEX_AUTH_URL_0=https://index0.example.com
PIP_INDEX_AUTH_USERNAME_0=myusername
PIP_INDEX_AUTH_PASSWORD_0=mypassword
PIP_INDEX_AUTH_URL_1=https://index1.example.com
PIP_INDEX_AUTH_USERNAME_1=myusername1
PIP_INDEX_AUTH_PASSWORD_1=mypassword1
or this might be a better way
PIP_INDEX_AUTH_0_URL=https://index0.example.com
PIP_INDEX_AUTH_0_USERNAME=myusername
PIP_INDEX_AUTH_0_PASSWORD=mypassword
PIP_INDEX_AUTH_1_URL=https://index1.example.com
PIP_INDEX_AUTH_1_USERNAME=myusername1
PIP_INDEX_AUTH_1_PASSWORD=mypassword1
Hmm, looking at the pip docs on keyring support, I can't see where to specify a username when installing a package using keyring auth. Keyring allows me to set multiple username/password credentials for a single service
/index-url
. Is this specified elsewhere in the docs @pradyunsg?
It is rare to see this many people take interest in en issue :)
It was mentioned that pip is a volunteer effort. I think everybody understands this, and I did submit a PR for this, complete with tests and documentation.
I think neglecting the issue of pip requiring installation of a package is really bad (I'm aware of the workaround). In a large company CI setup it quicly becomes a mess if the CI installation also has to take care of installing the build tools for individual and very diverse projects which uses a lot of different technologies. At my company, individual projects do not have OS login to the CI servers, and the servers do not have internet access, so all package/tool installation is done by the CI server and goes through our local repositories.
The package installer should not depend on a package.
I think the issue of supporting multiple indexes can be seen as an extension, so maybe we could start by just documenting that multiple indexes are not (currently) supported through env variables. If you think supporting this is required before accepting a PR, then we can add that. I have no need for it, and I think most people wont. If you have a private protected repository you can probably proxy all indexes through that.
Please take a look at @absassi's comment which explains very nicely why supporting env variables is a good idea, and not a competitor to keyring.
For those suggesting embedding credentials in URLs, please read e.g. this: https://neilmadden.blog/2019/01/16/can-you-ever-safely-include-credentials-in-a-url/
Please take a look at @absassi's comment which explains very nicely why supporting env variables is a good idea, and not a competitor to keyring.
Please take a look at my comment which quotes that, and mentions why the proposed PR wasn't sufficient either. :)
See also https://github.com/pypa/pip/pull/6723#issuecomment-513504035
Is there any objection to using the
PIP_
prefix?
I'm fine either way.
This is going to have to be distributed separately from pip, so it should be reasonable to pick something generic; but either way, it shouldn't be that difficult to make changes / allow making changes to that prefix. :)
Please take a look at @absassi's comment which explains very nicely why supporting env variables is a good idea, and not a competitor to keyring.
Please take a look at my comment which quotes that, and mentions why the proposed PR wasn't sufficient either. :)
See also #6723 (comment)
Sorry @pradyunsg, which of your comments are you referring to?
I see that @reixd directly accesses a public repo (pypi.org?) and a private one. In that case my implementation would leak the credentials to pypi.org (as documented, but who reads the documentation :) ). A solution which also checks the index url would definitely be better in that case. This exact scenario could be handled by alway attempting access without credentials first, but of cause this would not handle multiple protected repositories requiring different credentials. I'm not sure if this is a real issue though.
An index checking solution should allow patterns like *.mydomain.host
and *.mydomain.host/p1
and chose the best match.
This is going to have to be distributed separately from pip, so it should be reasonable to pick something generic; but either way, it shouldn't be that difficult to make changes / allow making changes to that prefix. :)
But the package will be pip-specific, so it makes sense to use PIP_
. In any case, I still can't determine what happens in the case that keyring
has multiple username/password defined for a single index-url? Does pip just pick one at random?
@lhupfeldt would my example for env vars handle all the cases for multiple public and private index urls if it were implemented directly in pip? Public urls wouldn't have any PIP_INDEX_AUTH_URL_
or other env vars defined, while private indexes requiring auth would get a full mapping of index-url/username/password. Due to constraints on naming of environment variables it's not possible to embed a url directly in the environment variable name. As @pradyunsg mentioned previously, poetry can do this because it has a mapping specified in pyproject.toml for the index-url/repo name but this is something which pip would never support.
PIP_INDEX_AUTH_URL_0=https://index0.example.com
PIP_INDEX_AUTH_USERNAME_0=myusername
PIP_INDEX_AUTH_PASSWORD_0=mypassword
PIP_INDEX_AUTH_URL_1=https://index1.example.com
PIP_INDEX_AUTH_USERNAME_1=myusername1
PIP_INDEX_AUTH_PASSWORD_1=mypassword1
@wwuck I think your solution with matching index url with PIP_INDEX_AUTH_URL_<n>
and the getting credentials from corresponding ...USERNAME_<n>/...PASSWORD_<n>
is fine. I would like it to allow glob pattern matching on the URL, because I think that if using multiple private indexes, it is likely that the same credentials are used.
Hmmm, so after reading https://github.com/pypa/pip/issues/10389 I guess I should hold off on trying to implement a keyring backend for environment variables.
I don't think that credential helper API would happen anytime soon to solve your problem. If you start developing a keyring backend now, you'd probably be like version 3.0 when that API is released.
Description:
We're using pip in a CI/CD pipeline to install packages from a private repository protected by username/password. Currently there are two options to pass those credentials to pip, either encode it directly in the URL or create a
pip.conf
file. Both options are not very attractive. The first option would entail to have those credentials hard coded in the source code, the second one would mean we'd have to generate this config file during the build process.Most CI/CD build pipelines support some kind of "secret variables", which is a fancy word for environment variables that you can set in the CI/CD and that will be enabled in the build pipeline. This is usually the way to pass secrets.
It would be very helpful if
pip
would also support some mechanism to read secrets from environment variables.See also: https://www.jfrog.com/confluence/display/RTF/PyPI+Repositories#PyPIRepositories-UsingCredentials for a realistic use case.