Closed edmorley closed 5 years ago
This is another thing I've been wanting to do, but is likely a post-launch task. I'm a bit on the fence of how to exactly handle them, but one option I've been thinking about is instead of API keys, using client certificates for TLS which would give built in support for a signing based approach, high entropy, allow it to be used for all uploads (could store it password protected typically, and just offer a password-less option for automation), and expiration of the token.
One problem with this is that it would mean we can't route uploads through our CDN, however uploads don't really gain anything by going through the CDN (and in fact, it's a bit harmful since uploads need a longer timeout than normal requests, we're forced to have high, 20+ second timeouts on upload).
I've also considered something like OAuth here instead of just an API Key which would solve a lot of these problems as well, in addition to making it possible to securely grant other projects the ability to modify just one package (or one scope inside of that package).
There's also the likely future signing tool, TUF, where we could just enforce that all uploads must be signed by a valid key for that author, and use that key as the authentication.
A lot of different options here, which is another reason why it's likely a post-launch task :)
I really really want to get my PyPI information out of CI. At the risk of responding to a years-old thread...I want to volunteer to do this work (as well as #996). :-)
At this point, Warehouse is launched (albeit in beta), and the legacy upload endpoint is deprecated. I assert it would be reasonable to add this, although others who have actually been thinking about this for more than a few days might know better than I (so feel free to chime in and tell me!).
Before reading @dstufft's comments above, my thought was, "Implement API keys", but I have nothing against the idea of certificates.
Here is what I think is needed (sub out "key" below with "certificate" if we go that route):
Would anyone have any objection to my taking some time to scope this out further, with an eye to getting the work in soon-ish? (Since I am new around here, it would probably require some review cycles from @ewdurbin, @dstufft, etc.)
Thanks for your note, @lukesneeringer, and sorry for the slow response! Thank you for volunteering to do this work!
As I think you know, but just for context for future folks finding this discussion, the folks working on Warehouse have gotten funding to concentrate on improving and deploying Warehouse, and have kicked off work towards our development roadmap -- the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable.
So that's what Ernest, Dustin, and Nicole have been concentrating on and will be concentrating on for the next few months. But I'm putting your suggestion on the agenda for a sync-up meeting we'll have tomorrow, and we'll have more thoughts for you then.
Also, Ernest wants to help folks get started as new Warehouse contributors, and has 30-minute 1:1 slots available each week, in case that's something you, or someone you know, is interested in doing.
Thanks again! Talk with you soon.
But I'm putting your suggestion on the agenda for a sync-up meeting we'll have tomorrow, and we'll have more thoughts for you then.
Sounds good.
My guess is that this is probably work that can be done in parallel to the Warehouse improvements. The trick would be that the keys would not work on legacy PyPI, and therefore anyone using the legacy URL would not be able to use them. (However, I suppose it might be the case that review cycles or whatnot would not be available.)
Also, Ernest wants to help folks get started as new Warehouse contributors, and has 30-minute 1:1 slots available each week, in case that's something you, or someone you know, is interested in doing.
Yep -- we already did that. :-)
@lukesneeringer Oh great, glad you and Ernest have already started working together!
In our meeting today we said "yay" about you working on this! Please go ahead and start scoping it out and let us know your thought process as you work. I could imagine you finding the SimplySecure resources useful on a UX level.
We also decided that, as a new feature, this belongs in a future milestone. But we will do our level best to review your work as you have it!
Could I please ask you to also comment at #996 to mention there that you're working on it?
That sounds good. I currently plan to be at only day one of PyCon sprints, but I have not booked plane tickets yet, so that is mutable.
I'm going to be at all four days, and I think a number of other Python packaging/distribution developers will too. I think it'll likely be a good time to hash out architectural stuff and do some pair programming and in-person reviews. So if you could be there two or 2.5 days that would probably be of benefit.
@lukesneeringer how is this going? Do you have any plans or code that you'd like us to look at?
@brainwane Hi there; I have been on vacation. I will have a plan (and some code) for you to look at it on Friday. :-)
@brainwane @ewdurbin et. al.
I have started doing research and have a minimal amount of code to paper, but want to bring in other voices and such at this point.
I assert that a new database model should be added to packaging
which adds the API keys. My rationale for putting this in packaging
rather than in accounts
is simply because it is going to have a relationship to Project
, and this avoids a circular import (or circular mental reasoning).
As far as the contents of the keys, I am learning toward using RSA keys, and having the interface essentially allow you to upload the public keys (meaning that initially the user will be responsible for creating said keys). The request would include a signature (signed with the private key) which is validated against the expected signature using the public key.
There are a few downsides to this approach: It puts the burden on the package maintainer to generate the key, is the big one. We could potentially later do what some other sites do where they provide generation, store the public key in the database, and give a forced one-time download of the private key. I think we should start with user generated keys, however, because it allows users to generate encrypted keys (and store the encryption key in CI).
I propose the following data model:
class AccessKey(db.Model):
'''Access keys for project access separate from passwords.'''
__tablename__ = "access_keys"
# We a public key, and the client is responsible for signing
# requests using the corresponding private key.
public_key = Column(Text)
# An access key must be attached to either a user or a project,
# and may be attached to both.
#
# Attaching to a user limits the scope of the key to projects which
# that user can access (at the time access is attempted, not when the
# key is made). It is possible for this set of projects to be zero.
#
# Attaching to a project limits the scope of the key to that project.
user = orm.relationship(
User,
backref="access_keys",
lazy=False,
nullable=True,
)
project = orm.relationship(
Project,
backref="access_keys",
lazy=False,
nullable=True,
)
expiry = Column(
DateTime(timezone=False),
nullable=True,
)
created = Column(
DateTime(timezone=False),
nullable=False,
server_default=sql.func.now(),
)
What is important here is the relationships with user
and project
-- essentially, a key can be attached to either or both. If attached to a user (and only a user), then uploads may be performed for anything that user can access at the time the upload is attempted. Keys attached to projects (and only a project) may be used for that project with no other authentication. Keys attached to both must meet both restrictions (this implies that the key could provide no privileges whatsoever should the user lose access to the project later).
I assert that this would require an additional auth_policy
class to be added, which would validate that the API key was sent. If user-based authentication was also performed, or if the key was tied to a specific user, it would return that user, otherwise it would return a stub user object.
Then, logic needs to be added to forklift/legacy.py
, the file_upload
function, to (a) short-circuit the user check at the top of the function, allowing logic to continue if an API key was provided, (b) extend request.has_permission
to validate the API key for the project. Additionally, the "create a new project" would need to be short-circuited if the API keys did not positively identify a single user.
Finally, this would entail a change in setuptools
to use keys when provided. Ideally, we would search for keys in certain directories (e.g. the project directory, ~/.pypa
or its Windows equivalent, etc.) for a file with a specific naming convention, and use it if found.
The biggest restriction on this is that the API keys would only initially be usable for the upload functionality. (Presumably register
could very shortly follow.)
My biggest concern about this is the keys. Using RSA keys provides several useful benefits (passphrases, high entropy, etc.), but it also feels a good bit more complicated than what (for example) npm does. Other package managers just use direct API keys (which seems awfully insecure) or some less secure form of key-secret combo. One concern here is that if this is deemed too difficult to get set up, people may choose not to use it.
Another concern is "key collision". The idea here is to be able to have single package tokens, but most people work on lots of Python packages. Similarly, one might want a passphrase-based key to go in CI and a passphrase-less key to go on local disk. I think this sort of thing is solvable by being smart about naming and ordering. A potentially attractive idea is to actually look for project-specific keys in a subdirectory of the user's home directory before looking for a project-specific key in the project folder, then look for user-wide keys in the reverse order.
This is a writeup for the moment. I have the model written (a trivial task) and am going to start with the various pieces of plumbing written above. Feedback is definitely desired before I get too far into it.
Looks awesome!
One question though -- What does it mean to "provide" a private key? Presumably we're not expecting users to literally send their private keys to warehouse. Would it be a signature? On what? (File hash? File hash + file name?)
I'm still digesting this, but I wanted to jot down my initial thoughts:
What does it mean to "provide" a private key? Presumably we're not expecting users to literally send their private keys to warehouse. Would it be a signature? On what? (File hash? File hash + file name?)
Sorry, I misspoke. I meant that they should upload a public key.
I am hesitant to have API keys that are not attached to users in some way.
Here is my rationale: often times, organizations want API keys that are independent of individual users. Essentially, a company does not want all of their keys to break because an individual user leaves, and most people do not make separate accounts on PyPI for work vs. personal.
A higher weight way to solve this problem would be to have explicit organizations to which credentials could be attached. Lower-weight version: Encourage the use of organization-level "users" -- but that has the downside of things like a single password that everyone shares, etc.
What kind of attacks are we hoping to protect against? What capabilities can we assume the attacker has? It's hard to judge whether or not there's something of value being provided by the use of RSA keys over something simpler without a clearly defined threat model to judge possible solutions against.
Ironically enough, I actually went for something heavier weight because of your thinking in the previous comment. :-)
Given that we have been storing passwords in plaintext since time immemorial, and given that most other package managers go for simpler solutions, there is a good chance that I am overthinking here. I think there are two primary concerns: (1) mistakenly leaked or otherwise woefully unsecured keys, and (2) sniffing. The proposal I am putting forward does basically nothing for (1) and effectively guards against (2).
Designing something that involves changes to other tooling (such as twine or setuptools) should ideally get some buy-in from the authors of those tools, and possibly discussion on distutils-sig (or at least a pointer mailed to distutils-sig about the discussion). One benefit of a simpler API key is that it could simply be piped through those other tools as a password, and thus wouldn't require wider agreement.
I would not have any issue with this approach; it would still improve on the status quo. (This is, of course, an inferior approach for the sniffing concern, but it has been fine for most other package managers to the best of my knowledge.)
I've been thinking about this a lot and I think I've come up with the start of a proposal for how to handle this.
To start off with, I think that a public key crypto scheme is generally overkill for this application. We don't have N untrusted parties that need to be able to verify the authenticity of a request, just a single trusted party (PyPI) which means there is little need to have a scheme that hides the credential from PyPI itself. A public key crypto scheme would prevent people who can intercept traffic from getting the upload credentials, however PyPI also mandates TLS which provides the same properties (and if you can break our TLS, you can also just upload your own key to the account of the user you wish to upload as).
I do think that some sort of request signing scheme could be useful in that in the case of a TLS break it limits the need to re-roll credentials across the board. I think that would be more generally fit for an "upload 2.0" API that would eventually sunset the current API rather than extending the current API. Utilizing a bearer token authentication scheme today would mean that twine, etc just work immediately and we can constrain the effort of needing get agreement between multiple parties to a point in the future when we actually want to design a new upload API.
So given that, I think the best path forward is to use some sort of bearer token authentication scheme. The simplest of these would just be a simple API key where Warehouse generates a high entropy secret and shares that with the user. However that has a number of drawbacks, such as:
After thinking about this for a few days and talking it over with some folks who are much smarter than me, I think that the best path forward here is to use Macaroons. Macaroons are a form of bearer token, where Warehouse would mint a macaroon and pass it onto the user. In this regards they are similar to the simple API key design. Where macaroons get more powerful is that instead of baking things like "here is a list of projects that this macaroon is able to access" into the database, it is stored as part of the macaroon itself AND that given a macaroon, you can add additional "caveats" (like which projects it can access) and mint a new macaroon without ever talking to Warehouse.
This would allow a workflow like:
project: foobar
to it, and mint a new Macaroon which does not expire and hands it to Travis.expires: now() + 10 minutes
and mint a new Macaroon that they pass into the deployer code as the password.k0
), does HMAC(k0,<initial data>)
to get k1
(this is what was given to the user in step (1).HMAC(k1, <new data>)
to get k2
(this is what was given to Travis in step 2).HMAC(k2, <new data>)
to get k3
(this is what was sent to Warehouse by Travis in step (3).k3
.True
for this specific request.I think this ends up making a really nice system that can be retrofit to the current upload API, but that allows a lot of new capabilities for delegation and restricting delegation.
We would need to figure out which caveats we want to support in Warehouse (even though anyone can add caveats, the caveats have to be supported by Warehouse so you can't add arbitrary ones). Off the top of my head I can think of the following (naming can be adjusted):
Project
, Release
, File
, User
, maybe more in the future?). These might be best as top level caveats like users: []
, releases: []
, etc. Will require work to figure out the best way to represent this.One caveat to the above, is that we likely should require both a not-before
and a expires
or neither. Having one or the other should likely be an error. The general idea is that in order for expiration like that to work as expected, the clocks of the system adding the expiration caveats and the clock of the system verifying them have to be relatively in sync. If someone sets an expires: now() + 10 minutes
but their clock is accidentally set to 10 years in the future, they're going to get a macaroon that is valid for 10 years 10 minutes, instead of just 10 minutes-- and worse it's going to be a silent error where it just appears to work. However if we require both fields to either be present or absent, then they'd only be able to generate a macaroon that is valid for 10 minutes, in 10 years which would get rejected immediately and they'd be able to determine something is wrong.
Internal to Warehouse, we'd need a table that would store all of the initial root keys (k0
in the above). Each of these keys would essentially be a tuple of identifier, key
that the verification code could then look up. We would never expose the k0
to end users. We might also want to store any initial caveats that were added to the macaroon given to the user so that we can give the user a list of their macaroons as well as the caveats that were attached to each. Obviously we'd also need a UI for managing the macroons that a user has, with the ability to delete them (which would effectively revoke them and make any sub-macaroon worthless) as well as create them, ideally with a UI to add caveats to the initial macaroon (even though the system will support end users adding caveats on their own without talking to Warehouse, a UI in Warehouse will be a lot more user friendly for the common case).
There is still the question of whether it makes sense to have these Macaroons able to be specified without a user or not. For right now, I think we should make them owned by a specific user, HOWEVER I think that we should always add an additional caveat that describes which user the macroon belongs to. Essentially scoping the macaroon to just that user (not the same as scoping it to a specific user in the resources
above, but saying "this macaroon can effectively act as $USER
"). While the first cut may always add that, and may have a database column that always links a macaroon to a specific user, adding that caveat to start with means that in the future, if we want to have macaroons that are not tied to a specific user, we will be able to do that without accidentally granting super privileges to all existing macaroons.
What do you think?
One nice thing about the above idea, is that end users can completely ignore the extra power granted by Macaroons if they want. They can simply treat it as an API key, generate a Macaroon via Warehouse, pass it to Twine as the password for their account and go on with life without giving it a second thought. Of course someone who wants to unlock additional power for delegation or resource constraining can opt into that and can utilize the additional benefits provided by Macaroons. The design of Macaroons is fairly brilliant in this aspect where it's very much a low cost abstraction for the basic use case, but enables you to delve deeper into it to unlock a lot more power.
In the hypothetical Travis example, the end user might not even be aware that Travis is adding additional caveats to their Macaroon (or that it's even possible to do that). It could easily be presented to them as nothing more than adding the API key to Travis, with Travis doing the extra expiration stuff completely behind the scenes for the benefit of the user.
I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire. While we generally trust TLS to protect these credentials, automatic scope limitation like that is basically zero cost and provides defense in depth so that in the case someone is able to look into the TLS stream (for example, a company MITM proxy) the credentials they get are practically useless once they've been used once.
To extend what @dstufft said -- we can have a constraint be "upload file
This means that while, of course, we all love TLS a lot, with this in place, TLS would not be needed for security -- even complete breakage of TLS would not allow someone to upload a package with malicious code.
I do think that some sort of request signing scheme could be useful in that in the case of a TLS break it limits the need to re-roll credentials across the board. I think that would be more generally fit for an "upload 2.0" API that would eventually sunset the current API rather than extending the current API. Utilizing a bearer token authentication scheme today would mean that twine, etc just work immediately and we can constrain the effort of needing get agreement between multiple parties to a point in the future when we actually want to design a new upload API.
I like this idea. One thing we could do is allow either the API key to be sent directly (meaning that we get the constraint of effort you mention) or a signing algorithm, which then tools could opt in to.
I think this ends up making a really nice system that can be retrofit to the current upload API, but that allows a lot of new capabilities for delegation and restricting delegation.
I like this idea too. +1.
There is still the question of whether it makes sense to have these Macaroons able to be specified without a user or not. For right now, I think we should make them owned by a specific user, HOWEVER I think that we should always add an additional caveat that describes which user the macroon belongs to.
I am okay with this provisionally but I think it is an important limitation. I do think that group permissions will be necessary. I do think it is reasonable to add groups first and then group level permissions, rather than the converse order that I originally proposed.
I am a little confused about the user caveat. I would like to understand its purpose. Would we allow Macaroons to be moved? This seems implausible. Additionally, I do think we eventually need to end up with user-independent tokens. The point needs to be that the credentials continue to work after a user is no longer part of that group.
I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire.
This is definitely something that would be easy and valuable to do. It gives you the value of request signing, effectively.
I am sold on this. I will get an implementation of this in soon. Also, thanks @dstufft for teaching me about Macaroons. That is really valuable.
@lukesneeringer I should mention that after talking through this more with people, I think that the right implementation would look something like:
os.urandom()
) and store that value in the database. The caveat would then effectively be id: <opaque value>
and it would look up the opaque value from the database table and if it doesn't exist, then the macaroon is not valid. That allows us (and users) to revoke a single macaroon "tree" without having to revoke the entire set of macaroons. This scheme also allows us to not need to store root keys in the database.Some replies to your comments:
I like this idea. One thing we could do is allow either the API key to be sent directly (meaning that we get the constraint of effort you mention) or a signing algorithm, which then tools could opt in to.
We need a number of the values out of the macaroons in order to construct what the signing key should be in this hypothetical signing algorithm. It'd possible we could do something where we rip enough stuff out of the macaroon format so that caveats are still sent along with the request, but not the actual HMAC signatures, so that the server could still construct what the expected signing key is. I'm not sure that would be worth the effort.
I am okay with this provisionally but I think it is an important limitation. I do think that group permissions will be necessary. I do think it is reasonable to add groups first and then group level permissions, rather than the converse order that I originally proposed.
I am a little confused about the user caveat. I would like to understand its purpose. Would we allow Macaroons to be moved? This seems implausible. Additionally, I do think we eventually need to end up with user-independent tokens. The point needs to be that the credentials continue to work after a user is no longer part of that group.
Basically the user caveat is how you say "This macaroon (and by nature, all macaroons created from this macroon) are scoped to only resources that X user has access to and acts as if it were X user. The reason this is a caveat instead of just a column in the database (although it likely should be one of those too) is to keep our options open in the future, so that we can potentially start creating macaroons without that caveat (perhaps with a resources: Project(foobar)
caveat) without accidentally upgrading all previous macaroons to unscoped.
So to start out with, we'd always include that acts-as-user:
caveat, but maybe in the future we don't and everything works fine.
Oh, and the opaque, unique value would also give us something we can enumerate to display a list of macaroons owned by a user (to allow them to delete/revoke unused ones) and allows us to do things like record in the database whenever they are used, so we can display the last time each one was used too.
All this sounds good. Also, apologies, I was replying inline before I read the entire post, so my first quote above is less relevant than I thought. If twine ever makes the addition you recommend to add the date range, it accomplishes the same thing as signing would.
(Poking to say this is in progress; a PR will be coming soon-ish.)
@lukesneeringer Hi! Would welcome seeing the PR! :) And if you'll be available for the Open Space/Birds of a Feather session during PyCon, we could talk about this then, too. (https://wiki.python.org/psf/PackagingSprints has more topics & people on it now.)
Yeah, I am about half done. It is going slower than planned due to time. I will be at the BOF and the first day of sprints and plan to finish it then. :-)
(Update for sprints: Work is going on this branch).
@lukesneeringer how far along is this? There is quite a bit in the branch, but it's hard to see what is still missing.
Very sorry for the delay! I have posted a pull request based on the work we started at US PyCon 2018 (back in May...): https://github.com/pypa/warehouse/pull/4599
I'm going to block off some time in the near future to be able to respond to feedback / iterate on this diff as needed. Thanks!
I botched a merge with upstream on that branch, see instead #4949.
I'm quite eager to see this feature, especially if it helps solve the issue of creating a credential that's safe to share in a code repo with collaborators. We have a situation in importlib_resources where we would like for the CI/CD toolchain to be able to publish releases, but GitLab doesn't provide the robust encryption like Travis CI does... and they allow passwords to be unmasked by anybody with admin on the group/repository. Meaning because I've configured my PyPI password for gitlab.com/python-devs, any admin of python-devs can release packages as me O_O.
Will this access token feature work to satisfy that need? That is, will it be possible for one of the maintainers or owners of a PyPI project to generate a token (or tokens) for a limited subset of projects to which he has access and then use that token for CI uploads just to those projects?
@jaraco Yes that's the plan.
@dstufft has the macaroon thing gone away or still on the books?
Pardon my ignorance, but it seems the extra macaroon functionality (derived tokens) is well nice, but not widely used (?) and not strictly needed.
IMO (coming from user's perspective), JWT (or paseto, or an opaque token) would do the job just as well (perhaps with extra server round trip when derived token is needed), and is more common/understandable/supported. Am I about right?
How different is this from OAuth?
Good news - this work is now underway, and you can expect to see @woodruffw working on it. Thanks to Open Tech Fund for making this work possible!
@woodruffw and @nlhkabu will be implementing per-user API keys as an alternative form of authentication in the PyPI auth flows (project maintainers/owners will be able to choose whether to use username/password OR an API key to authenticate when uploading packages). These will be application-specific tokens scoped to individual users/projects (and the task will also cover adding token-based login support to twine
and setuptools
, to improve the security of uploads).
Subtasks (each of which will probably turn into its own GitHub issue):
So once this is done:
We'll be working on this in 2019. The Packaging Working Group, seeking donations and further grants to fund more work, got some new funding from the Open Technology Fund, and the audit log is part of the current grant-funded project.
Looks like this is next up on the agenda. I wanted to take a moment to throw out a few specifications we probably need for this 1st generation of API Tokens:
Questions:
I spoke with @woodruffw yesterday and he's excited about getting started on this feature this month! He'll be working on the Warehouse side.
My understanding, based on @ewdurbin's (above), is that we won't need changes to Twine or other client-side tooling, because the upload-scoped API key will be a drop-in replacement for existing auth. But I'm open to being shown I am wrong here. :-) (cc @jaraco @pganssle FYI.)
I hope this will still be based on macaroons as outlined above, because that will allow things like creating one time tokens created client side to push packages from a devpi instance to pypi. Currently we send the pypi credentials to the devpi-server which isn't ideal and wouldn't be changed if we can't amend the token.
- Should we require a username be supplied along side API Tokens at upload time, or is the token itself sufficient?
I plan to model tokens as linked tightly to their originating users, so I think just the token will be sufficient 🙂
I hope this will still be based on macaroons as outlined above, because that will allow things like creating one time tokens created client side to push packages from a devpi instance to pypi. Currently we send the pypi credentials to the devpi-server which isn't ideal and wouldn't be changed if we can't amend the token.
I wasn't on planning on using macaroons for the first implementation, for two reasons:
My (current) plan is to make the implementation flexible, to enable future iterations to use macaroons.
Thanks, @fschulze, for the question. @woodruffw I think the idea of allowing single-use tokens is a good one, and I'm not sure if your planned approach handles it or not. Can you say more about constraints modeled on the PyPI side, and how a devpi use case might work?
There is already a PR using macaroons which looks promising, why redo this work with less functionality? https://github.com/pypa/warehouse/pull/4949
I don't think the size of macaroons is an issue. Such tokens will be stored in a file, password safe or environment variable. Nobody will type them out.
I see several use cases for macaroons which aren't possible otherwise:
push
request to devpi-serverWith a simple upload token the compromise of that token along the way is much more severe.
Using CI for releases
delegating maintenance
I'm sure people can come up with more once the concept is understood and the necessary caveats are in place.
Hi Florian,
It's a shame that there's no Wikipedia page for macaroons, as the Google paper is relatively hard to read.
I feel that the same class of problems that x509 implementations suffered from is generally applicable to macaroon implementations.
The server and the macaroon "deriver" must agree on the language of the additional restrictions, aka "caveats".
You've mentioned a few restrictions:
hash="abcdef"
project="pyfoo"
version="2.*"
(in some language)Someone may want to come up with:
os="win"
arch="arm"
last_valid_date="2020-01-01"
python_version=">3.4 <3.8"
python_implementation="pypy"
license="MIT"
moon_phase="waxing"
The list could go on forever :)
Changes to the language will be hard once there are live clients; client and server implementations would have to be synchronised...
Given server S with secret key K, intermediaries M1, M2, M3, M4, etc., and final macaroons MFn, imagine that one of M1..M4 was leaked. The server gets flooded with MFn's, what can be done?
In other words, server (admin) would have to be very smart and guess correctly which intermediary to blacklist.
If there's an automated system to report lost tokens, the requests have to be signed (by parent? self-signed?), and that system too can be abused.
My 2c: I'd rather see a simpler, working system sooner.
Just to clear a few things up:
Changes to the language will be hard once there are live clients; client and server implementations would have to be synchronised...
This really isn't true (or at least, isn't a big deal). If a new caveat type is going to be added the only real constraint is support must be added to Warehouse first. If a client never updates to add support for that caveat then the only downside that client will have is they won't be able to use that specific caveat. Everything else will continue to function correctly, even if that client is passed a macaroon that already has that caveat applied, things will continue to work fine. A client only needs to understand caveats it personally adds to a macaroon, they aren't allowed to infer any information from existing caveats on that macaroon (because they can't validate them) nor can they remove caveats (since it would invalidate the hmac). All they can do is append new caveats.
Given server S with secret key K, intermediaries M1, M2, M3, M4, etc., and final macaroons MFn, imagine that one of M1..M4 was leaked. The server gets flooded with MFn's, what can be done?
- blacklisting K means banning all derived macaroons, even those there were not compromised
- blacklisting M4 runs the risk that actual leak was M3, and attacker will issue a new M4'
- blacklisting all possible M4'''' requires unbounded storage
In other words, server (admin) would have to be very smart and guess correctly which intermediary to blacklist.
There wouldn't be a singular secret K. Basically the way the PoC implements this is every time a user goes into Warehouse and asks for an API key, a new secret key is generated for that key. So a user can have an unbounded number of macaroons, all with a different lineage to different secret keys. There is no such thing as invoking a specific intermediate, you can only revoke the entire secret key and any macaroons derived from it.
This is basically no different than the "simple" API key solution. Users are 100% in control of the blast radius of revoking their keys. If they use 5 different computers and 3 different services and they generate a single API key OR they generate a single macaroon and hand the same one to all of them, in both cases they will need to revoke that singluar key and replace it in all services. Likewise if they generate a different API key or create a new Macaroon for each service, then if one is compromised they've successfully limited the blast radius to that single service and only have to revoke the one key/macaroon.
Effectively, the macaroon system as I originally intended it to be implemented, is basically just a better form of simple API keys. Users who don't wish to use the extended power of macaroons can treat them entirely like simple API keys and never think about them any differently. Users (or client tools like twine) that want to use that extended power can do so.
Basically the only downside (other than some complexity on the server side) for someone who wishes to use a macaroon as a simple API key, is that they're quite a bit longer than a simple API key typically is.
@dimaqq Funny you should ask. I started a draft of a wikipedia article on macaroons a while ago, which got deleted for lack of submission or updates: https://en.wikipedia.org/wiki/Draft:Macaroons_(authorization) If it makes sense to the folks here, and it seems "notable" (in Wikipedia terms) let's clean it up and submit it!
It's a shame that there's no Wikipedia page for macaroons, as the Google paper is relatively hard to read.
Hrm, the Mozilla Tech Talk seems to be gone: https://air.mozilla.org/macaroons-cookies-with-contextual-caveats-for-decentralized-authorization-in-the-cloud/ That was pretty good. Did anyone find a copy of that somewhere, or knows someone at mozilla who could look into why it's gone?
Good tip, @fschulze. It turns out the Internet Archive has the original blurb: Mozilla Tech Talk: Macaroons, and the 54-minute video of the "air mozilla" talk by Úlfar Erlingsson is still available at Macaroons talk by Úlfar Erlingsson
I found a better version on YouTube: https://www.youtube.com/watch?v=CGBZO5n_SUg
Just wrapped up a call with @ewdurbin, @dstufft, and @brainwane and came up with the following roadmap:
Began work on this in #6084.
I just realised that there's a patent 😨 https://patents.google.com/patent/US9397990 Who can check if/how this might affect PYPA?
Very good observation and question!
I'm asking Google to add that patent to their Open Patent Non-Assertion Pledge – Google. It seems like a no-brainer to me.....
Who can check if/how this might affect PYPA?
I've requested that @VanL, the PSF's General Counsel weigh in on this question.
A scary number of people embed their PyPI username and password in their Travis config (using Travis encrypted variables), to enable automatic releases for certain branches (Travis even has a guide for it).
In addition, the packaging docs example encourages users to save their password in plaintext on disk in their
.pypirc
(they can of course use twine's password prompting, but I wonder how many read that far, rather than just copy the example verbatim?)Whilst in an ideal world credentials of any form wouldn't be saved unencrypted to disk (or given to a third-party such as Travis) and instead users prompted every time - I don't think this is realistic in practice.
API keys would offer the following advantages:
password
field in.pypirc
, leaving a much safer choice between password prompting every time, or creating an API key that could be saved to disk.Many thanks :-)
(I've filed this against warehouse since I'm presuming this is beyond the scope of maintenance-only changes being made to the old PyPI codebase)