pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.55k stars 956 forks source link

Add support for API keys #994

Closed edmorley closed 5 years ago

edmorley commented 8 years ago

A scary number of people embed their PyPI username and password in their Travis config (using Travis encrypted variables), to enable automatic releases for certain branches (Travis even has a guide for it).

In addition, the packaging docs example encourages users to save their password in plaintext on disk in their .pypirc (they can of course use twine's password prompting, but I wonder how many read that far, rather than just copy the example verbatim?)

Whilst in an ideal world credentials of any form wouldn't be saved unencrypted to disk (or given to a third-party such as Travis) and instead users prompted every time - I don't think this is realistic in practice.

API keys would offer the following advantages:

  1. Higher-entropy credentials that are guaranteed to have not been reused on multiple sites.
  2. The ability to give the API key a smaller permissions scope than that of the owner's username/password. For example an API key would not be permitted to change a user's listed GPG key or in the future, their 2FA settings. Or an API key could be limited to a specific package.
  3. Since this would be separate from the existing username/password auth, a signing based approach (eg HMAC) could be used, without breaking older clients. This would ensure that if a connection was MiTMed (eg due to a protocol or client exploit), the API key itself would still remain secure.
  4. Eventually support could be dropped for the password field in .pypirc, leaving a much safer choice between password prompting every time, or creating an API key that could be saved to disk.
  5. If/when support is added for 2FA, users who need to automate PyPI uploads won't have to forgo 2FA for their whole account. They could instead choose to just create a 2FA-circumventing API key for just the one package that needs uploads in automation.

Many thanks :-)

(I've filed this against warehouse since I'm presuming this is beyond the scope of maintenance-only changes being made to the old PyPI codebase)

dstufft commented 8 years ago

This is another thing I've been wanting to do, but is likely a post-launch task. I'm a bit on the fence of how to exactly handle them, but one option I've been thinking about is instead of API keys, using client certificates for TLS which would give built in support for a signing based approach, high entropy, allow it to be used for all uploads (could store it password protected typically, and just offer a password-less option for automation), and expiration of the token.

One problem with this is that it would mean we can't route uploads through our CDN, however uploads don't really gain anything by going through the CDN (and in fact, it's a bit harmful since uploads need a longer timeout than normal requests, we're forced to have high, 20+ second timeouts on upload).

I've also considered something like OAuth here instead of just an API Key which would solve a lot of these problems as well, in addition to making it possible to securely grant other projects the ability to modify just one package (or one scope inside of that package).

There's also the likely future signing tool, TUF, where we could just enforce that all uploads must be signed by a valid key for that author, and use that key as the authentication.

A lot of different options here, which is another reason why it's likely a post-launch task :)

lukesneeringer commented 6 years ago

I really really want to get my PyPI information out of CI. At the risk of responding to a years-old thread...I want to volunteer to do this work (as well as #996). :-)

At this point, Warehouse is launched (albeit in beta), and the legacy upload endpoint is deprecated. I assert it would be reasonable to add this, although others who have actually been thinking about this for more than a few days might know better than I (so feel free to chime in and tell me!).

Before reading @dstufft's comments above, my thought was, "Implement API keys", but I have nothing against the idea of certificates.

Here is what I think is needed (sub out "key" below with "certificate" if we go that route):

Would anyone have any objection to my taking some time to scope this out further, with an eye to getting the work in soon-ish? (Since I am new around here, it would probably require some review cycles from @ewdurbin, @dstufft, etc.)

brainwane commented 6 years ago

Thanks for your note, @lukesneeringer, and sorry for the slow response! Thank you for volunteering to do this work!

As I think you know, but just for context for future folks finding this discussion, the folks working on Warehouse have gotten funding to concentrate on improving and deploying Warehouse, and have kicked off work towards our development roadmap -- the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable.

So that's what Ernest, Dustin, and Nicole have been concentrating on and will be concentrating on for the next few months. But I'm putting your suggestion on the agenda for a sync-up meeting we'll have tomorrow, and we'll have more thoughts for you then.

Also, Ernest wants to help folks get started as new Warehouse contributors, and has 30-minute 1:1 slots available each week, in case that's something you, or someone you know, is interested in doing.

Thanks again! Talk with you soon.

lukesneeringer commented 6 years ago

But I'm putting your suggestion on the agenda for a sync-up meeting we'll have tomorrow, and we'll have more thoughts for you then.

Sounds good.

My guess is that this is probably work that can be done in parallel to the Warehouse improvements. The trick would be that the keys would not work on legacy PyPI, and therefore anyone using the legacy URL would not be able to use them. (However, I suppose it might be the case that review cycles or whatnot would not be available.)

Also, Ernest wants to help folks get started as new Warehouse contributors, and has 30-minute 1:1 slots available each week, in case that's something you, or someone you know, is interested in doing.

Yep -- we already did that. :-)

brainwane commented 6 years ago

@lukesneeringer Oh great, glad you and Ernest have already started working together!

In our meeting today we said "yay" about you working on this! Please go ahead and start scoping it out and let us know your thought process as you work. I could imagine you finding the SimplySecure resources useful on a UX level.

We also decided that, as a new feature, this belongs in a future milestone. But we will do our level best to review your work as you have it!

Could I please ask you to also comment at #996 to mention there that you're working on it?

Also, will you be at the PyCon sprints?

lukesneeringer commented 6 years ago

That sounds good. I currently plan to be at only day one of PyCon sprints, but I have not booked plane tickets yet, so that is mutable.

brainwane commented 6 years ago

I'm going to be at all four days, and I think a number of other Python packaging/distribution developers will too. I think it'll likely be a good time to hash out architectural stuff and do some pair programming and in-person reviews. So if you could be there two or 2.5 days that would probably be of benefit.

brainwane commented 6 years ago

@lukesneeringer how is this going? Do you have any plans or code that you'd like us to look at?

lukesneeringer commented 6 years ago

@brainwane Hi there; I have been on vacation. I will have a plan (and some code) for you to look at it on Friday. :-)

lukesneeringer commented 6 years ago

@brainwane @ewdurbin et. al.

I have started doing research and have a minimal amount of code to paper, but want to bring in other voices and such at this point.

The API keys themselves

I assert that a new database model should be added to packaging which adds the API keys. My rationale for putting this in packaging rather than in accounts is simply because it is going to have a relationship to Project, and this avoids a circular import (or circular mental reasoning).

Key Contents

As far as the contents of the keys, I am learning toward using RSA keys, and having the interface essentially allow you to upload the public keys (meaning that initially the user will be responsible for creating said keys). The request would include a signature (signed with the private key) which is validated against the expected signature using the public key.

There are a few downsides to this approach: It puts the burden on the package maintainer to generate the key, is the big one. We could potentially later do what some other sites do where they provide generation, store the public key in the database, and give a forced one-time download of the private key. I think we should start with user generated keys, however, because it allows users to generate encrypted keys (and store the encryption key in CI).

Data Structure

I propose the following data model:

class AccessKey(db.Model):
    '''Access keys for project access separate from passwords.'''

    __tablename__ = "access_keys"

    # We a public key, and the client is responsible for signing
    # requests using the corresponding private key.
    public_key = Column(Text)

    # An access key must be attached to either a user or a project,
    # and may be attached to both.
    #
    # Attaching to a user limits the scope of the key to projects which
    # that user can access (at the time access is attempted, not when the
    # key is made). It is possible for this set of projects to be zero.
    #
    # Attaching to a project limits the scope of the key to that project.
    user = orm.relationship(
        User,
        backref="access_keys",
        lazy=False,
        nullable=True,
    )
    project = orm.relationship(
        Project,
        backref="access_keys",
        lazy=False,
        nullable=True,
    )

    expiry = Column(
        DateTime(timezone=False),
        nullable=True,
    )

    created = Column(
        DateTime(timezone=False),
        nullable=False,
        server_default=sql.func.now(),
    )

What is important here is the relationships with user and project -- essentially, a key can be attached to either or both. If attached to a user (and only a user), then uploads may be performed for anything that user can access at the time the upload is attempted. Keys attached to projects (and only a project) may be used for that project with no other authentication. Keys attached to both must meet both restrictions (this implies that the key could provide no privileges whatsoever should the user lose access to the project later).

Implementation

I assert that this would require an additional auth_policy class to be added, which would validate that the API key was sent. If user-based authentication was also performed, or if the key was tied to a specific user, it would return that user, otherwise it would return a stub user object.

Then, logic needs to be added to forklift/legacy.py, the file_upload function, to (a) short-circuit the user check at the top of the function, allowing logic to continue if an API key was provided, (b) extend request.has_permission to validate the API key for the project. Additionally, the "create a new project" would need to be short-circuited if the API keys did not positively identify a single user.

Finally, this would entail a change in setuptools to use keys when provided. Ideally, we would search for keys in certain directories (e.g. the project directory, ~/.pypa or its Windows equivalent, etc.) for a file with a specific naming convention, and use it if found.

Restrictions

The biggest restriction on this is that the API keys would only initially be usable for the upload functionality. (Presumably register could very shortly follow.)

Concerns

My biggest concern about this is the keys. Using RSA keys provides several useful benefits (passphrases, high entropy, etc.), but it also feels a good bit more complicated than what (for example) npm does. Other package managers just use direct API keys (which seems awfully insecure) or some less secure form of key-secret combo. One concern here is that if this is deemed too difficult to get set up, people may choose not to use it.

Another concern is "key collision". The idea here is to be able to have single package tokens, but most people work on lots of Python packages. Similarly, one might want a passphrase-based key to go in CI and a passphrase-less key to go on local disk. I think this sort of thing is solvable by being smart about naming and ordering. A potentially attractive idea is to actually look for project-specific keys in a subdirectory of the user's home directory before looking for a project-specific key in the project folder, then look for user-wide keys in the reverse order.

Conclusion

This is a writeup for the moment. I have the model written (a trivial task) and am going to start with the various pieces of plumbing written above. Feedback is definitely desired before I get too far into it.

moshez commented 6 years ago

Looks awesome!

One question though -- What does it mean to "provide" a private key? Presumably we're not expecting users to literally send their private keys to warehouse. Would it be a signature? On what? (File hash? File hash + file name?)

dstufft commented 6 years ago

I'm still digesting this, but I wanted to jot down my initial thoughts:

lukesneeringer commented 6 years ago

What does it mean to "provide" a private key? Presumably we're not expecting users to literally send their private keys to warehouse. Would it be a signature? On what? (File hash? File hash + file name?)

Sorry, I misspoke. I meant that they should upload a public key.

I am hesitant to have API keys that are not attached to users in some way.

Here is my rationale: often times, organizations want API keys that are independent of individual users. Essentially, a company does not want all of their keys to break because an individual user leaves, and most people do not make separate accounts on PyPI for work vs. personal.

A higher weight way to solve this problem would be to have explicit organizations to which credentials could be attached. Lower-weight version: Encourage the use of organization-level "users" -- but that has the downside of things like a single password that everyone shares, etc.

What kind of attacks are we hoping to protect against? What capabilities can we assume the attacker has? It's hard to judge whether or not there's something of value being provided by the use of RSA keys over something simpler without a clearly defined threat model to judge possible solutions against.

Ironically enough, I actually went for something heavier weight because of your thinking in the previous comment. :-)

Given that we have been storing passwords in plaintext since time immemorial, and given that most other package managers go for simpler solutions, there is a good chance that I am overthinking here. I think there are two primary concerns: (1) mistakenly leaked or otherwise woefully unsecured keys, and (2) sniffing. The proposal I am putting forward does basically nothing for (1) and effectively guards against (2).

Designing something that involves changes to other tooling (such as twine or setuptools) should ideally get some buy-in from the authors of those tools, and possibly discussion on distutils-sig (or at least a pointer mailed to distutils-sig about the discussion). One benefit of a simpler API key is that it could simply be piped through those other tools as a password, and thus wouldn't require wider agreement.

I would not have any issue with this approach; it would still improve on the status quo. (This is, of course, an inferior approach for the sniffing concern, but it has been fine for most other package managers to the best of my knowledge.)

dstufft commented 6 years ago

I've been thinking about this a lot and I think I've come up with the start of a proposal for how to handle this.

To start off with, I think that a public key crypto scheme is generally overkill for this application. We don't have N untrusted parties that need to be able to verify the authenticity of a request, just a single trusted party (PyPI) which means there is little need to have a scheme that hides the credential from PyPI itself. A public key crypto scheme would prevent people who can intercept traffic from getting the upload credentials, however PyPI also mandates TLS which provides the same properties (and if you can break our TLS, you can also just upload your own key to the account of the user you wish to upload as).

I do think that some sort of request signing scheme could be useful in that in the case of a TLS break it limits the need to re-roll credentials across the board. I think that would be more generally fit for an "upload 2.0" API that would eventually sunset the current API rather than extending the current API. Utilizing a bearer token authentication scheme today would mean that twine, etc just work immediately and we can constrain the effort of needing get agreement between multiple parties to a point in the future when we actually want to design a new upload API.

So given that, I think the best path forward is to use some sort of bearer token authentication scheme. The simplest of these would just be a simple API key where Warehouse generates a high entropy secret and shares that with the user. However that has a number of drawbacks, such as:

After thinking about this for a few days and talking it over with some folks who are much smarter than me, I think that the best path forward here is to use Macaroons. Macaroons are a form of bearer token, where Warehouse would mint a macaroon and pass it onto the user. In this regards they are similar to the simple API key design. Where macaroons get more powerful is that instead of baking things like "here is a list of projects that this macaroon is able to access" into the database, it is stored as part of the macaroon itself AND that given a macaroon, you can add additional "caveats" (like which projects it can access) and mint a new macaroon without ever talking to Warehouse.

This would allow a workflow like:

  1. User gets a Macaroon from Warehouse that is specific to their user and has access to all permissions and does not expire.
  2. User decides they want to utilize Travis to upload their "foobar" package, so they take their Macaroon and attach a new caveat, project: foobar to it, and mint a new Macaroon which does not expire and hands it to Travis.
  3. Travis wants to limit the ability for credentials they have to leak and be used persistently, so when their deployer code runs, instead of giving a token that is good ~forever, they attach a new caveat, expires: now() + 10 minutes and mint a new Macaroon that they pass into the deployer code as the password.
  4. Warehouse looks at the macaroon that the Travis deployer uploaded and then does:
    1. Sees which root key it used, looks that up from the DB (call this k0), does HMAC(k0,<initial data>) to get k1 (this is what was given to the user in step (1).
    2. Adds the caveats added in step (2) and does HMAC(k1, <new data>) to get k2 (this is what was given to Travis in step 2).
    3. Adds the caveat added in step (3) and does HMAC(k2, <new data>) to get k3 (this is what was sent to Warehouse by Travis in step (3).
    4. Verifies that the macaroon sent by Travis is the same thing we generated as k3.
    5. Iterates over all of the caveats added throughout all of the steps, and evaluates them to ensure that they all evaluate to True for this specific request.

I think this ends up making a really nice system that can be retrofit to the current upload API, but that allows a lot of new capabilities for delegation and restricting delegation.

We would need to figure out which caveats we want to support in Warehouse (even though anyone can add caveats, the caveats have to be supported by Warehouse so you can't add arbitrary ones). Off the top of my head I can think of the following (naming can be adjusted):

One caveat to the above, is that we likely should require both a not-before and a expires or neither. Having one or the other should likely be an error. The general idea is that in order for expiration like that to work as expected, the clocks of the system adding the expiration caveats and the clock of the system verifying them have to be relatively in sync. If someone sets an expires: now() + 10 minutes but their clock is accidentally set to 10 years in the future, they're going to get a macaroon that is valid for 10 years 10 minutes, instead of just 10 minutes-- and worse it's going to be a silent error where it just appears to work. However if we require both fields to either be present or absent, then they'd only be able to generate a macaroon that is valid for 10 minutes, in 10 years which would get rejected immediately and they'd be able to determine something is wrong.

Internal to Warehouse, we'd need a table that would store all of the initial root keys (k0 in the above). Each of these keys would essentially be a tuple of identifier, key that the verification code could then look up. We would never expose the k0 to end users. We might also want to store any initial caveats that were added to the macaroon given to the user so that we can give the user a list of their macaroons as well as the caveats that were attached to each. Obviously we'd also need a UI for managing the macroons that a user has, with the ability to delete them (which would effectively revoke them and make any sub-macaroon worthless) as well as create them, ideally with a UI to add caveats to the initial macaroon (even though the system will support end users adding caveats on their own without talking to Warehouse, a UI in Warehouse will be a lot more user friendly for the common case).

There is still the question of whether it makes sense to have these Macaroons able to be specified without a user or not. For right now, I think we should make them owned by a specific user, HOWEVER I think that we should always add an additional caveat that describes which user the macroon belongs to. Essentially scoping the macaroon to just that user (not the same as scoping it to a specific user in the resources above, but saying "this macaroon can effectively act as $USER"). While the first cut may always add that, and may have a database column that always links a macaroon to a specific user, adding that caveat to start with means that in the future, if we want to have macaroons that are not tied to a specific user, we will be able to do that without accidentally granting super privileges to all existing macaroons.

What do you think?

dstufft commented 6 years ago

One nice thing about the above idea, is that end users can completely ignore the extra power granted by Macaroons if they want. They can simply treat it as an API key, generate a Macaroon via Warehouse, pass it to Twine as the password for their account and go on with life without giving it a second thought. Of course someone who wants to unlock additional power for delegation or resource constraining can opt into that and can utilize the additional benefits provided by Macaroons. The design of Macaroons is fairly brilliant in this aspect where it's very much a low cost abstraction for the basic use case, but enables you to delve deeper into it to unlock a lot more power.

In the hypothetical Travis example, the end user might not even be aware that Travis is adding additional caveats to their Macaroon (or that it's even possible to do that). It could easily be presented to them as nothing more than adding the API key to Travis, with Travis doing the extra expiration stuff completely behind the scenes for the benefit of the user.

I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire. While we generally trust TLS to protect these credentials, automatic scope limitation like that is basically zero cost and provides defense in depth so that in the case someone is able to look into the TLS stream (for example, a company MITM proxy) the credentials they get are practically useless once they've been used once.

moshez commented 6 years ago

To extend what @dstufft said -- we can have a constraint be "upload file with hash " which means replay attacks (short of pre-image attacks) are useless. If we further have twine auto-attenuate with this + short time frame it means future pre-image attacks are useless too, there better be a pre-image attack ready now.

This means that while, of course, we all love TLS a lot, with this in place, TLS would not be needed for security -- even complete breakage of TLS would not allow someone to upload a package with malicious code.

lukesneeringer commented 6 years ago

I do think that some sort of request signing scheme could be useful in that in the case of a TLS break it limits the need to re-roll credentials across the board. I think that would be more generally fit for an "upload 2.0" API that would eventually sunset the current API rather than extending the current API. Utilizing a bearer token authentication scheme today would mean that twine, etc just work immediately and we can constrain the effort of needing get agreement between multiple parties to a point in the future when we actually want to design a new upload API.

I like this idea. One thing we could do is allow either the API key to be sent directly (meaning that we get the constraint of effort you mention) or a signing algorithm, which then tools could opt in to.

I think this ends up making a really nice system that can be retrofit to the current upload API, but that allows a lot of new capabilities for delegation and restricting delegation.

I like this idea too. +1.

There is still the question of whether it makes sense to have these Macaroons able to be specified without a user or not. For right now, I think we should make them owned by a specific user, HOWEVER I think that we should always add an additional caveat that describes which user the macroon belongs to.

I am okay with this provisionally but I think it is an important limitation. I do think that group permissions will be necessary. I do think it is reasonable to add groups first and then group level permissions, rather than the converse order that I originally proposed.

I am a little confused about the user caveat. I would like to understand its purpose. Would we allow Macaroons to be moved? This seems implausible. Additionally, I do think we eventually need to end up with user-independent tokens. The point needs to be that the credentials continue to work after a user is no longer part of that group.

I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire.

This is definitely something that would be easy and valuable to do. It gives you the value of request signing, effectively.


I am sold on this. I will get an implementation of this in soon. Also, thanks @dstufft for teaching me about Macaroons. That is really valuable.

dstufft commented 6 years ago

@lukesneeringer I should mention that after talking through this more with people, I think that the right implementation would look something like:

Some replies to your comments:

I like this idea. One thing we could do is allow either the API key to be sent directly (meaning that we get the constraint of effort you mention) or a signing algorithm, which then tools could opt in to.

We need a number of the values out of the macaroons in order to construct what the signing key should be in this hypothetical signing algorithm. It'd possible we could do something where we rip enough stuff out of the macaroon format so that caveats are still sent along with the request, but not the actual HMAC signatures, so that the server could still construct what the expected signing key is. I'm not sure that would be worth the effort.

I am okay with this provisionally but I think it is an important limitation. I do think that group permissions will be necessary. I do think it is reasonable to add groups first and then group level permissions, rather than the converse order that I originally proposed.

I am a little confused about the user caveat. I would like to understand its purpose. Would we allow Macaroons to be moved? This seems implausible. Additionally, I do think we eventually need to end up with user-independent tokens. The point needs to be that the credentials continue to work after a user is no longer part of that group.

Basically the user caveat is how you say "This macaroon (and by nature, all macaroons created from this macroon) are scoped to only resources that X user has access to and acts as if it were X user. The reason this is a caveat instead of just a column in the database (although it likely should be one of those too) is to keep our options open in the future, so that we can potentially start creating macaroons without that caveat (perhaps with a resources: Project(foobar) caveat) without accidentally upgrading all previous macaroons to unscoped.

So to start out with, we'd always include that acts-as-user: caveat, but maybe in the future we don't and everything works fine.

dstufft commented 6 years ago

Oh, and the opaque, unique value would also give us something we can enumerate to display a list of macaroons owned by a user (to allow them to delete/revoke unused ones) and allows us to do things like record in the database whenever they are used, so we can display the last time each one was used too.

lukesneeringer commented 6 years ago

All this sounds good. Also, apologies, I was replying inline before I read the entire post, so my first quote above is less relevant than I thought. If twine ever makes the addition you recommend to add the date range, it accomplishes the same thing as signing would.

lukesneeringer commented 6 years ago

(Poking to say this is in progress; a PR will be coming soon-ish.)

brainwane commented 6 years ago

@lukesneeringer Hi! Would welcome seeing the PR! :) And if you'll be available for the Open Space/Birds of a Feather session during PyCon, we could talk about this then, too. (https://wiki.python.org/psf/PackagingSprints has more topics & people on it now.)

lukesneeringer commented 6 years ago

Yeah, I am about half done. It is going slower than planned due to time. I will be at the BOF and the first day of sprints and plan to finish it then. :-)

lukesneeringer commented 6 years ago

(Update for sprints: Work is going on this branch).

fschulze commented 6 years ago

@lukesneeringer how far along is this? There is quite a bit in the branch, but it's hard to see what is still missing.

steiza commented 6 years ago

Very sorry for the delay! I have posted a pull request based on the work we started at US PyCon 2018 (back in May...): https://github.com/pypa/warehouse/pull/4599

I'm going to block off some time in the near future to be able to respond to feedback / iterate on this diff as needed. Thanks!

steiza commented 5 years ago

I botched a merge with upstream on that branch, see instead #4949.

jaraco commented 5 years ago

I'm quite eager to see this feature, especially if it helps solve the issue of creating a credential that's safe to share in a code repo with collaborators. We have a situation in importlib_resources where we would like for the CI/CD toolchain to be able to publish releases, but GitLab doesn't provide the robust encryption like Travis CI does... and they allow passwords to be unmasked by anybody with admin on the group/repository. Meaning because I've configured my PyPI password for gitlab.com/python-devs, any admin of python-devs can release packages as me O_O.

Will this access token feature work to satisfy that need? That is, will it be possible for one of the maintainers or owners of a PyPI project to generate a token (or tokens) for a limited subset of projects to which he has access and then use that token for CI uploads just to those projects?

dstufft commented 5 years ago

@jaraco Yes that's the plan.

dimaqq commented 5 years ago

@dstufft has the macaroon thing gone away or still on the books?

Pardon my ignorance, but it seems the extra macaroon functionality (derived tokens) is well nice, but not widely used (?) and not strictly needed.

IMO (coming from user's perspective), JWT (or paseto, or an opaque token) would do the job just as well (perhaps with extra server round trip when derived token is needed), and is more common/understandable/supported. Am I about right?

LucidOne commented 5 years ago

How different is this from OAuth?

brainwane commented 5 years ago

Good news - this work is now underway, and you can expect to see @woodruffw working on it. Thanks to Open Tech Fund for making this work possible!

brainwane commented 5 years ago

@woodruffw and @nlhkabu will be implementing per-user API keys as an alternative form of authentication in the PyPI auth flows (project maintainers/owners will be able to choose whether to use username/password OR an API key to authenticate when uploading packages). These will be application-specific tokens scoped to individual users/projects (and the task will also cover adding token-based login support to twine and setuptools, to improve the security of uploads).

Subtasks (each of which will probably turn into its own GitHub issue):

So once this is done:

We'll be working on this in 2019. The Packaging Working Group, seeking donations and further grants to fund more work, got some new funding from the Open Technology Fund, and the audit log is part of the current grant-funded project.

ewdurbin commented 5 years ago

Looks like this is next up on the agenda. I wanted to take a moment to throw out a few specifications we probably need for this 1st generation of API Tokens:

Questions:

brainwane commented 5 years ago

I spoke with @woodruffw yesterday and he's excited about getting started on this feature this month! He'll be working on the Warehouse side.

My understanding, based on @ewdurbin's (above), is that we won't need changes to Twine or other client-side tooling, because the upload-scoped API key will be a drop-in replacement for existing auth. But I'm open to being shown I am wrong here. :-) (cc @jaraco @pganssle FYI.)

fschulze commented 5 years ago

I hope this will still be based on macaroons as outlined above, because that will allow things like creating one time tokens created client side to push packages from a devpi instance to pypi. Currently we send the pypi credentials to the devpi-server which isn't ideal and wouldn't be changed if we can't amend the token.

woodruffw commented 5 years ago
  • Should we require a username be supplied along side API Tokens at upload time, or is the token itself sufficient?

I plan to model tokens as linked tightly to their originating users, so I think just the token will be sufficient 🙂

I hope this will still be based on macaroons as outlined above, because that will allow things like creating one time tokens created client side to push packages from a devpi instance to pypi. Currently we send the pypi credentials to the devpi-server which isn't ideal and wouldn't be changed if we can't amend the token.

I wasn't on planning on using macaroons for the first implementation, for two reasons:

  1. Macaroons primarily serve to enable distributed authorization/attestation and permission attenuation, things that PyPI (as a centralized source) doesn't currently require.
  2. Macaroons themselves are long, even without a large constraint set. For just uploads, short tokens with constraints modeled on the PyPI side will probably be sufficient.

My (current) plan is to make the implementation flexible, to enable future iterations to use macaroons.

nealmcb commented 5 years ago

Thanks, @fschulze, for the question. @woodruffw I think the idea of allowing single-use tokens is a good one, and I'm not sure if your planned approach handles it or not. Can you say more about constraints modeled on the PyPI side, and how a devpi use case might work?

fschulze commented 5 years ago

There is already a PR using macaroons which looks promising, why redo this work with less functionality? https://github.com/pypa/warehouse/pull/4949

I don't think the size of macaroons is an issue. Such tokens will be stored in a file, password safe or environment variable. Nobody will type them out.

I see several use cases for macaroons which aren't possible otherwise:

  1. for devpi:
    • The user would get a token from PyPI which only allows uploading releases which they have upload rights to.
    • devpi client would amend that token with for example the hash of the release and use the new token to send a push request to devpi-server
    • devpi-server takes the file it has and uses the token to upload to PyPI
    • PyPI checks the token, sees the hash restriction, checks the file and other restrictions in the token and either authorizes or denies the upload
    • The token can only be used to upload this one file, if the connection between the user and pypi is compromised (devpi-server is malicious or whatever), then the token is useless for anything else

With a simple upload token the compromise of that token along the way is much more severe.

  1. Using CI for releases

    • several projects use Travis-CI and other CI systems to make releases from tags automatically
    • projects would create a token which only allows uploads for that specific project
    • a further restriction could be an IP range restriction from where the upload may occure
    • if the CI system has further support for macaroons, then the token could be amended with a restriction that the CI system has to sign the token before use with PyPI
    • PyPI would check all the restrictions before authorizing the upload
  2. delegating maintenance

    • lets say there is a project which has a new major release
    • the maintainer doesn't want to work on the old version anymore
    • there is someone willing to provide bugfix releases for the old version
    • a new token is generated which restricts uploads to versions for that old major version
    • the new maintainer can do any kind of releases they want, but are not able to interfere with the current releases

I'm sure people can come up with more once the concept is understood and the necessary caveats are in place.

dimaqq commented 5 years ago

Hi Florian,

It's a shame that there's no Wikipedia page for macaroons, as the Google paper is relatively hard to read.

I feel that the same class of problems that x509 implementations suffered from is generally applicable to macaroon implementations.

Caveat language

The server and the macaroon "deriver" must agree on the language of the additional restrictions, aka "caveats".

You've mentioned a few restrictions:

Someone may want to come up with:

The list could go on forever :)

Caveat language updates

Changes to the language will be hard once there are live clients; client and server implementations would have to be synchronised...

Invalidation

Given server S with secret key K, intermediaries M1, M2, M3, M4, etc., and final macaroons MFn, imagine that one of M1..M4 was leaked. The server gets flooded with MFn's, what can be done?

In other words, server (admin) would have to be very smart and guess correctly which intermediary to blacklist.

If there's an automated system to report lost tokens, the requests have to be signed (by parent? self-signed?), and that system too can be abused.

My 2c: I'd rather see a simpler, working system sooner.

dstufft commented 5 years ago

Just to clear a few things up:

Changes to the language will be hard once there are live clients; client and server implementations would have to be synchronised...

This really isn't true (or at least, isn't a big deal). If a new caveat type is going to be added the only real constraint is support must be added to Warehouse first. If a client never updates to add support for that caveat then the only downside that client will have is they won't be able to use that specific caveat. Everything else will continue to function correctly, even if that client is passed a macaroon that already has that caveat applied, things will continue to work fine. A client only needs to understand caveats it personally adds to a macaroon, they aren't allowed to infer any information from existing caveats on that macaroon (because they can't validate them) nor can they remove caveats (since it would invalidate the hmac). All they can do is append new caveats.

Given server S with secret key K, intermediaries M1, M2, M3, M4, etc., and final macaroons MFn, imagine that one of M1..M4 was leaked. The server gets flooded with MFn's, what can be done?

  • blacklisting K means banning all derived macaroons, even those there were not compromised
  • blacklisting M4 runs the risk that actual leak was M3, and attacker will issue a new M4'
  • blacklisting all possible M4'''' requires unbounded storage

In other words, server (admin) would have to be very smart and guess correctly which intermediary to blacklist.

There wouldn't be a singular secret K. Basically the way the PoC implements this is every time a user goes into Warehouse and asks for an API key, a new secret key is generated for that key. So a user can have an unbounded number of macaroons, all with a different lineage to different secret keys. There is no such thing as invoking a specific intermediate, you can only revoke the entire secret key and any macaroons derived from it.

This is basically no different than the "simple" API key solution. Users are 100% in control of the blast radius of revoking their keys. If they use 5 different computers and 3 different services and they generate a single API key OR they generate a single macaroon and hand the same one to all of them, in both cases they will need to revoke that singluar key and replace it in all services. Likewise if they generate a different API key or create a new Macaroon for each service, then if one is compromised they've successfully limited the blast radius to that single service and only have to revoke the one key/macaroon.

Effectively, the macaroon system as I originally intended it to be implemented, is basically just a better form of simple API keys. Users who don't wish to use the extended power of macaroons can treat them entirely like simple API keys and never think about them any differently. Users (or client tools like twine) that want to use that extended power can do so.

Basically the only downside (other than some complexity on the server side) for someone who wishes to use a macaroon as a simple API key, is that they're quite a bit longer than a simple API key typically is.

nealmcb commented 5 years ago

@dimaqq Funny you should ask. I started a draft of a wikipedia article on macaroons a while ago, which got deleted for lack of submission or updates: https://en.wikipedia.org/wiki/Draft:Macaroons_(authorization) If it makes sense to the folks here, and it seems "notable" (in Wikipedia terms) let's clean it up and submit it!

fschulze commented 5 years ago

It's a shame that there's no Wikipedia page for macaroons, as the Google paper is relatively hard to read.

Hrm, the Mozilla Tech Talk seems to be gone: https://air.mozilla.org/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/ That was pretty good. Did anyone find a copy of that somewhere, or knows someone at mozilla who could look into why it's gone?

nealmcb commented 5 years ago

Good tip, @fschulze. It turns out the Internet Archive has the original blurb: Mozilla Tech Talk: Macaroons, and the 54-minute video of the "air mozilla" talk by Úlfar Erlingsson is still available at Macaroons talk by Úlfar Erlingsson

fschulze commented 5 years ago

I found a better version on YouTube: https://www.youtube.com/watch?v=CGBZO5n_SUg

woodruffw commented 5 years ago

Just wrapped up a call with @ewdurbin, @dstufft, and @brainwane and came up with the following roadmap:

  1. We're going to work with macaroons from the very beginning, and not go with dumb API keys as I proposed above.
  2. In particular, I plan to work off of @dstufft's work in https://github.com/dstufft/warehouse/pull/2, pulling in work from https://github.com/pypa/warehouse/pull/4949 where appropriate.
  3. In order to minimize the amount of time spent on implementation, I intend to deliver a PoC version without constraints or a caveat language. This deliverable will meet the requirements of the SoW (allowing users to replace their username/password with a single token for upload only), and will serve as the foundation for future iterations. Upload-only enforcement will be handled by route whitelisting and a version identifier within the macaroon, preventing future iterations from inadvertently creating "god" tokens.
woodruffw commented 5 years ago

Began work on this in #6084.

dimaqq commented 5 years ago

I just realised that there's a patent 😨 https://patents.google.com/patent/US9397990 Who can check if/how this might affect PYPA?

nealmcb commented 5 years ago

Very good observation and question!

I'm asking Google to add that patent to their Open Patent Non-Assertion Pledge – Google. It seems like a no-brainer to me.....

ewdurbin commented 5 years ago

Who can check if/how this might affect PYPA?

I've requested that @VanL, the PSF's General Counsel weigh in on this question.