Trusted Publishing: Support self-hosted GitLab instances #15838

Open di opened 6 months ago

From https://github.com/pypi/warehouse/issues/13575:

Since the iss will be whatever the URL of the GitLab instance is, I'm assuming that for this provider we're not supporting self-hosted instances, right? That is, for now only "iss": "https://gitlab.com" will be supported

Scanning the rest of this Issue quickly I didn't see a direct reply on this (apologies if I missed it). I don't have any technical experience on this issue, so is self-hosted GitLab instances something that would be feasible to support in the future? I'm specifically interested in CERN's GitLab instance (c.f. https://github.com/di/id/issues/216) as there are multiple projects there that publish to PyPI where we'd like to transition to using Trusted Publishers.

Originally posted by @matthewfeickert in https://github.com/pypi/warehouse/issues/13575#issuecomment-2072342830

I think it would be technically possible to support a self-hosted instance like this (I see that https://gitlab.cern.ch/.well-known/openid-configuration and https://gitlab.cern.ch/oauth/discovery/keys are both publicly available, which is all we need for verification), the real question is what the process by which we would add support for all these one-off issuers.

I think one thing we could do here would be to allow the user to optionally configure the iss field as well. My main concern would be that this would allow anyone to essentially masquerade as a GitLab instance, and publish from anywhere, which would give me a little less confidence in the security of the publish event as a user.

Another option is that we allow-list certain issuers for projects in certain organizations, and manually handle these on a case-by-case basis.

Open to ideas though!

I think one thing we could do here would be to allow the user to optionally configure the iss field as well. My main concern would be that this would allow anyone to essentially masquerade as a GitLab instance, and publish from anywhere, which would give me a little less confidence in the security of the publish event as a user.

Yeah, I think this poses a decent risk 😅. I think a variant of this came up with self-hosted GitHub Enterprise users as well, and IIRC my thoughts there were:

Trusted publishing offers diminishing security returns for smaller IdPs: a big part of the security model/value for trusted publishing is the idea that each trusted IdP is large, has on-call staff, has key revocation/rotation policies that they can apply in the event of an emergency, etc. These might be true for smaller IdPs as well, but they're less guaranteed. In the setting of smaller IdPs, IMO manually configured API tokens actually provide better security properties (since revocation of publishing credentials doesn't potentially require handling a rarely-interacted-with internal IdP service).
Once the issuer becomes configurable, the "shape" of each OIDC JWT becomes malleable: PyPI would need to be able to distinguish GitHub-looking self-hosted IdPs from GitLab-looking ones, etc., since the claim sets aren't consistent. This also poses a timeseries challenge, e.g. GitHub deploys a claim change to their central IdP to fix a security problem, but we can't respect it until all self-hosted GitHub instances roll their IdP service over as well.

With those being said, I think the allow-list on an organization basis could work! But I think it would require some architectural changes to the current implementation, particularly around claim flexibility 🙂

Another option is that we allow-list certain issuers for projects in certain organizations, and manually handle these on a case-by-case basis.

This seems the most reasonable, as otherwise there could easily be abuse. I am mindful however that this now requires the maintainer team to now be responsible for a growing list of issuers. Would it help having self-hosted instances that want to use Trusted Publishers self identify for vetting through making a PR to the system that controls the allow-list? Or does this not actually decrease the maintenance burden much?

Would it help having self-hosted instances that want to use Trusted Publishers self identify for vetting through making a PR to the system that controls the allow-list? Or does this not actually decrease the maintenance burden much?

I think this would help a bit, but the bulk of the maintenance burden will (unfortunately) probably still be papering over the small differences between each self-hosted IdP.

I suppose we could reduce the burden of that by enforcing a baseline set of claims for each "shape" of IdP (e.g. GitHub, GitLab) via .well-known/openid-configuration on each issuer, but GitLab at minimum has an outstanding issue with that configuration being incomplete: https://gitlab.com/gitlab-org/gitlab/-/issues/428061

@woodruffw I'm at a GitLab hackathon (and GitLab team member here) and we're trying to figure out what exactly you would need in the openid-configuration file to move this forward. From how I understand it, that would not necessarily solve any of the verification issues that are being discussed above. Any pointers how we can move forward here after we solve the issue you list? Most likely the warehouse software also need to be modified to have some sort of approval process for self-managed instances?

I suppose we could reduce the burden of that by enforcing a baseline set of claims for each "shape" of IdP (e.g. GitHub, GitLab) via .well-known/openid-configuration on each issuer, but GitLab at minimum has an outstanding issue with that configuration being incomplete: https://gitlab.com/gitlab-org/gitlab/-/issues/428061

I would expect claims from self-managed instances to be roughly the same as the primary hosted instance (although it would be nice if the well-known claims were accurate so we could verify this in advance).

Hey @nickveenhof! Sorry for the belated response -- I was away from a computer when you sent your original message, and this got lost in the stack.

You're correct that this wouldn't solve the verification issues themselves, since the JWTs issued by a given IdP are the ultimate source of ground truth about the identity being presented.

However, it would allow us to build a cleaner UX for self-hosted instances: one simple way for us to enable self-hosted support is by having people give us a single well-known configuration URL, which we could then validate for the basic "shape" of expected forthcoming OIDC credentials. Similarly, it would help us diagnose and present better diagnostics when validation requirements change (e.g. if we strengthen the constraints on GitLab-shaped publishers, but some self-hosted instances haven't updated to contain newly required claims yet).

Hey @woodruffw @di @nickveenhof!

python-gitlab maintainer here and member of the code.siemens.com team. I started https://gitlab.com/gitlab-org/gitlab/-/merge_requests/170072 which I think should remove the blocker upstream issue you linked above.

I'm guessing this might get merged soon and be in the next release(s). What would be the next steps here once the discovery endpoint provides all the supported claims?

Note I don't expect there'd be that many deployments happening to pypi.org from self-hosted, as it's effectively source-available and github.com is where OSS python projects end up hosting their repos. But it would still be best to avoid static credentials where possible.

I see there's some discussion around how to vet self-hosted instances. Our team runs one of the largest self-hosted instances after gitlab.com, with a growing base of ~77k registered users, so would be interested if we can also get on board if there's a pilot project! We're a github secret scanning partner and automatically revoke any Siemens API keys/tokens sent from GitHub, and we could extend that most likely.

Our team's also pretty well versed in GitLab (we contribute regularly, including the initial OIDC provider implementation) so if there's anything missing we could probably add that upstream 👍

Note I don't expect there'd be that many deployments happening to pypi.org from self-hosted

Actually, at my $work, we want to do exactly that. We have an internal self-hosted GitLab and want to release packages to PyPI all the time. We'd like to adopt TP.

What would be the next steps here once the discovery endpoint provides all the supported claims?

I think we probably want to do what I mentioned above:

Another option is that we allow-list certain issuers for projects in certain organizations, and manually handle these on a case-by-case basis.

So that would mean creating the views & routes that would allow an organization owner to configure a custom issuer, and updating the Trusted Publishing logic to take this into consideration when publishing.

pypi / warehouse

Trusted Publishing: Support self-hosted GitLab instances #15838