pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.58k stars 964 forks source link

Trusted Publishing: Support self-hosted GitLab instances #15838

Open di opened 5 months ago

di commented 5 months ago

From https://github.com/pypi/warehouse/issues/13575:

Since the iss will be whatever the URL of the GitLab instance is, I'm assuming that for this provider we're not supporting self-hosted instances, right? That is, for now only "iss": "https://gitlab.com" will be supported

Scanning the rest of this Issue quickly I didn't see a direct reply on this (apologies if I missed it). I don't have any technical experience on this issue, so is self-hosted GitLab instances something that would be feasible to support in the future? I'm specifically interested in CERN's GitLab instance (c.f. https://github.com/di/id/issues/216) as there are multiple projects there that publish to PyPI where we'd like to transition to using Trusted Publishers.

Originally posted by @matthewfeickert in https://github.com/pypi/warehouse/issues/13575#issuecomment-2072342830

di commented 5 months ago

I think it would be technically possible to support a self-hosted instance like this (I see that https://gitlab.cern.ch/.well-known/openid-configuration and https://gitlab.cern.ch/oauth/discovery/keys are both publicly available, which is all we need for verification), the real question is what the process by which we would add support for all these one-off issuers.

I think one thing we could do here would be to allow the user to optionally configure the iss field as well. My main concern would be that this would allow anyone to essentially masquerade as a GitLab instance, and publish from anywhere, which would give me a little less confidence in the security of the publish event as a user.

Another option is that we allow-list certain issuers for projects in certain organizations, and manually handle these on a case-by-case basis.

Open to ideas though!

woodruffw commented 5 months ago

I think one thing we could do here would be to allow the user to optionally configure the iss field as well. My main concern would be that this would allow anyone to essentially masquerade as a GitLab instance, and publish from anywhere, which would give me a little less confidence in the security of the publish event as a user.

Yeah, I think this poses a decent risk 😅. I think a variant of this came up with self-hosted GitHub Enterprise users as well, and IIRC my thoughts there were:

  1. Trusted publishing offers diminishing security returns for smaller IdPs: a big part of the security model/value for trusted publishing is the idea that each trusted IdP is large, has on-call staff, has key revocation/rotation policies that they can apply in the event of an emergency, etc. These might be true for smaller IdPs as well, but they're less guaranteed. In the setting of smaller IdPs, IMO manually configured API tokens actually provide better security properties (since revocation of publishing credentials doesn't potentially require handling a rarely-interacted-with internal IdP service).
  2. Once the issuer becomes configurable, the "shape" of each OIDC JWT becomes malleable: PyPI would need to be able to distinguish GitHub-looking self-hosted IdPs from GitLab-looking ones, etc., since the claim sets aren't consistent. This also poses a timeseries challenge, e.g. GitHub deploys a claim change to their central IdP to fix a security problem, but we can't respect it until all self-hosted GitHub instances roll their IdP service over as well.

With those being said, I think the allow-list on an organization basis could work! But I think it would require some architectural changes to the current implementation, particularly around claim flexibility 🙂

matthewfeickert commented 5 months ago

Another option is that we allow-list certain issuers for projects in certain organizations, and manually handle these on a case-by-case basis.

This seems the most reasonable, as otherwise there could easily be abuse. I am mindful however that this now requires the maintainer team to now be responsible for a growing list of issuers. Would it help having self-hosted instances that want to use Trusted Publishers self identify for vetting through making a PR to the system that controls the allow-list? Or does this not actually decrease the maintenance burden much?

woodruffw commented 5 months ago

Would it help having self-hosted instances that want to use Trusted Publishers self identify for vetting through making a PR to the system that controls the allow-list? Or does this not actually decrease the maintenance burden much?

I think this would help a bit, but the bulk of the maintenance burden will (unfortunately) probably still be papering over the small differences between each self-hosted IdP.

I suppose we could reduce the burden of that by enforcing a baseline set of claims for each "shape" of IdP (e.g. GitHub, GitLab) via .well-known/openid-configuration on each issuer, but GitLab at minimum has an outstanding issue with that configuration being incomplete: https://gitlab.com/gitlab-org/gitlab/-/issues/428061

nickveenhof commented 4 months ago

@woodruffw I'm at a GitLab hackathon (and GitLab team member here) and we're trying to figure out what exactly you would need in the openid-configuration file to move this forward. From how I understand it, that would not necessarily solve any of the verification issues that are being discussed above. Any pointers how we can move forward here after we solve the issue you list? Most likely the warehouse software also need to be modified to have some sort of approval process for self-managed instances?

di commented 3 months ago

I suppose we could reduce the burden of that by enforcing a baseline set of claims for each "shape" of IdP (e.g. GitHub, GitLab) via .well-known/openid-configuration on each issuer, but GitLab at minimum has an outstanding issue with that configuration being incomplete: https://gitlab.com/gitlab-org/gitlab/-/issues/428061

I would expect claims from self-managed instances to be roughly the same as the primary hosted instance (although it would be nice if the well-known claims were accurate so we could verify this in advance).

woodruffw commented 3 months ago

Hey @nickveenhof! Sorry for the belated response -- I was away from a computer when you sent your original message, and this got lost in the stack.

You're correct that this wouldn't solve the verification issues themselves, since the JWTs issued by a given IdP are the ultimate source of ground truth about the identity being presented.

However, it would allow us to build a cleaner UX for self-hosted instances: one simple way for us to enable self-hosted support is by having people give us a single well-known configuration URL, which we could then validate for the basic "shape" of expected forthcoming OIDC credentials. Similarly, it would help us diagnose and present better diagnostics when validation requirements change (e.g. if we strengthen the constraints on GitLab-shaped publishers, but some self-hosted instances haven't updated to contain newly required claims yet).