weaveworks / weave-gitops

Weave GitOps provides insights into your application deployments, and makes continuous delivery with GitOps easier to adopt and scale across your teams.
https://docs.gitops.weave.works/
Apache License 2.0
906 stars 152 forks source link

Dex installation instructions unclear about some things #3263

Open kingdonb opened 1 year ago

kingdonb commented 1 year ago

Starting from this doc: https://docs.gitops.weave.works/docs/guides/setting-up-dex/

The docs for Weave GitOps covering Dex and OIDC have some unexplained parts:

https://github.com/weaveworks/weave-gitops/blob/7855a6b1e30efb52f932bcae43b525cdc958c90b/website/versioned_docs/version-0.14.1/guides/setting-up-dex.md#L113

In particular I noted that there is a secret under staticClients and the whole staticClients section is never really explained.

I went to Dex looking for guidance about what this is, and all I could come up with was: https://github.com/dexidp/helm-charts/blob/b5e75c51c0a09edcf90bb2ab81baed4ef5c6a7b9/charts/dex/values.yaml#L50-L52

So I kept digging and I found this example dev config, where there's a similar config with a secret embedded in it: https://github.com/dexidp/dex/blob/a857160fe91d1ab35ce37352b83e77fc922701cf/examples/config-dev.yaml#L106-L119

At least I'm assuming that's meant to be secret, but there's no specific instructions about it; should users set this to something specific to their installation, or is it meant to be randomized? The secret handling in general is kind of hands-off in these docs, I think with good reason, because otherwise we're going to find ourselves re-explaining how to properly handle a secret in every doc that uses secrets. But in general I think this is also another topic we'll need to address, (the docs shouldn't gloss over things without making a callout to the appropriate reference doc that explains the topic of "proper secret handling")

I'm just going to go ahead and follow the instructions, assuming that part shouldn't be modified, because I guess that Weave GitOps has a hardcoded secret in it for dex, but I'm sure that's not true, if there's a doc that covers this part and I missed it, please whack me with a cluebat ๐Ÿ™‡ ๐Ÿ™

kingdonb commented 1 year ago

Add to this, I think this page is out of date:

https://docs.gitops.weave.works/docs/configuration/tls/

I don't find serverTLS among the values that I can choose from here: https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/values.yaml

(A separate issue is that the repo for weave-gitops-enterprise chart is not public, so I'm not sure how users are supposed to discover what other values they can input there, if they need something that isn't specifically explained in a particular doc.)

I tried enabling ingress as it was explained, and I got persistent errors from the backend which is apparently still serving TLS

Reading enough different sources, I am eventually able to figure out that I need

values:
  tls:
    enabled: false

I finally landed at an enabled ingress which reaches the backend successfully and is doing TLS termination, but the error now is from clicking on the OIDC login button:

{"message":"oidc provider not configured","code":400}

Aha. I was thinking there must be something missing, (this is another bit that's evidently not covered in the doc.)

I'll see what I can figure out by poking around in values.yaml, I see the docs have prompted me to set up GitHub so it knows how to find Weave GitOps callback, but I don't see where I tell Weave GitOps which oauth provider or how to contact it.

kingdonb commented 1 year ago

I eventually got it working through sheer force of will. I figured out eventually that the staticClients information in Dex's configuration should match what Weave GitOps Enterprise's clientCredentialsSecret has listed under clientID and clientSecret (not clientId as the values.yaml had indicated, that was very difficult to figure everything out.)

I'm going to do my best to forget what I learned here so I can still viably test the workflow after improving the docs, because there was definitely a bit missing and I had to make a few flying leaps to get to a working config.

Most surprising of all, when I created a team in my GitHub org, and added myself to it, and Dex's configuration was finally correct, I got an error stating that I was in the correct org but not any teams. Only when I added a sub-team to the team I had created, did Dex let me in. I am not sure why the top-level team did not count as a team, but I did eventually get everything connected, (Can see what it looks like when RBAC is disconnected - perhaps this should be a stop on the tour, to highlight how RBAC and Dex work - I got locked out of Kubernetes even though I was able to get logged into Weave GitOps, because I thought that I could just change the team name, but that was mismatched vs what had been configured in RBAC!)

Finally, at the end of all this, I think that I should be able to use Dex to access the cluster directly, with whatever roles I've been given via RBAC... but I have unfortunately no idea how to do that, or where to look next for more details about that.

I did get RBAC connected via Dex today though, so ๐ŸŽ‰ this is really cool, and I feel like I solved a massive puzzle box!

kingdonb commented 1 year ago

Some breaking changes and some improvements to the OIDC workflow seem to have landed this release (0.16)

Today I found my upgraded WG instances in crashloop, because I had disabled the admin user creation but enabled the OIDC. IDK if that is a bug to report, but when I replaced the admin user (which was not possible to have in place at the same time as OIDC configuration, in the prior release) everything started working again!

I think I can talk a bit more broadly about the OIDC user experience now, after doing the full setup for WGE and seeing what that's like with a multi-cluster configuration (the rbac module in clusters/bases and how that's used to enable WGE admin access across clusters) I'm spending some time with the WG OSS now, to get a feel for how that experience is like when you are a basic user and your platform team provisioned you just one cluster, with WG OSS installed on it.

I didn't want to spend a bunch of time mucking around in the Dex installation guide before I could say I've done all that. But now I have done all that. I'm not comfortable taking the lead on this right now, but if anyone is working on this doc and has questions, I can definitely help with reviews or otherwise!

makkes commented 1 year ago

@kingdonb would you mind pasting the logs from the crash-looping pod? I have OIDC configured on my personal cluster and the upgrade went through just fine.

kingdonb commented 1 year ago

@makkes Do you have an admin user defined on your WG instance alongside of OIDC? Those were the conditions (not having any wego-admin user defined) that led my instance to crash looping, I only had it set up this way because before, it seemed that I could not get OIDC working while the admin user was also defined. I guessed they fixed it, and just undid what I had done. I don't think there is any bug, but I'll see if I can reproduce the crash, as I've lost the logs you're looking for now and they might still be relevant while this is at least this fresh in my memory...

kingdonb commented 1 year ago

Here you go:

(โŽˆ|howard-space:flux-system):~ (macos-homebrew *|u=)$ k logs -f ww-gitops-weave-gitops-7f55bd8fcd-m9tng
2023-02-06T22:50:11.619Z    INFO    gitops  cmd/cmd.go:130  Version {"version": "v0.16.0", "git-commit": "8d8a6165", "branch": "HEAD", "buildtime": "2023-02-01_12:33:34"}
I0206 22:50:12.672255       1 request.go:682] Waited for 1.012906257s due to client-side throttling, not priority and fairness, request: GET:https://10.108.113.47:443/apis/authorization.k8s.io/v1?timeout=32s
Error: could not initialise authentication server: could not create auth server: could not get secret for cluster user, secrets "cluster-user-auth" not found
Usage:
   [flags]

Flags:
      --auth-methods strings                     Which auth methods to use, valid values are user-account,oidc (default [user-account,oidc])
      --custom-oidc-scopes strings               Customise the requested scopes for then OIDC authentication flow - openid will always be requested (default [openid,offline_access,email,groups])
      --enable-metrics                           Starts the metrics listener
  -h, --help                                     help for this command
      --host string                              UI host (default "0.0.0.0")
      --insecure                                 do not attempt to read TLS certificates
      --log-level string                         log level (default "info")
      --metrics-address string                   If the metrics listener is enabled, bind to this address (default ":2112")
      --mtls                                     disable enforce mTLS
      --notification-controller-address string   the address of the notification-controller running in the cluster
      --oidc-client-id string                    The client ID for the OpenID Connect client
      --oidc-client-secret string                The client secret to use with OpenID Connect issuer
      --oidc-groups-claim string                 JWT claim to use as the user's group. If the claim is present it must be an array of strings (default "groups")
      --oidc-issuer-url string                   The URL of the OpenID Connect issuer
      --oidc-redirect-url string                 The OAuth2 redirect URL
      --oidc-secret-name string                  Name of the secret that contains OIDC configuration (default "oidc-auth")
      --oidc-token-duration duration             The duration of the ID token. It should be set in the format: number + time unit (s,m,h) e.g., 20m (default 1h0m0s)
      --oidc-username-claim string               JWT claim to use as the user name. By default email, which is expected to be a unique identifier of the end user. Admins can choose other claims, such as sub or name, depending on their provider (default "email")
      --path string                              Path url
      --port string                              UI port (default "9001")
      --tls-cert-file string                     filename for the TLS certificate, in-memory generated if omitted
      --tls-private-key-file string              filename for the TLS key, in-memory generated if omitted
      --use-k8s-cached-clients                   Enables the use of cached clients

Error: could not initialise authentication server: could not create auth server: could not get secret for cluster user, secrets "cluster-user-auth" not found

Again, I only disabled the admin user because it couldn't be used together with OIDC at the time, if I remember correctly. I can imagine a user who would want to disable the admin user, but if I set values.adminUser.create: false this is the result.

Maybe that's a bug?

makkes commented 1 year ago

Aah, I see what happened there and the chart docs/values aren't explicit about it: If you want to exclusively rely on OIDC you need to disable the Secret-based authentication by setting this flag on the weave-gitops Deployment:

--auth-methods=oidc

You achieve this by supplying that flag through the additionalArgs Helm chart value:

kind: HelmRelease
[...]
spec:
  values:
    additionalArgs:
    - --auth-methods=oidc

This effectively disables the Secret-based authentication and should lead to your Pod to come up.

kingdonb commented 1 year ago

That definitely worked, but all that being necessary was unexpected and as you say slightly undocumented. I'm not sure how it should work, but I'd expect this additionalArg to get added automatically when I set adminUser.create: false because that setting change sets the expectation that there is no admin user, or password authentication at all (for me at least.)

If there isn't any way to set that expectation on the behavior clearly for the user, then I guess I'd expect there to be one. On top of that, if password authentication is disabled I'd expect to see the username and password prompts disappear as well.

There is a workaround as you mentioned so at least I don't have to set an admin password anymore, and now we're underway with a refresh of the Dex + OIDC docs, so this all can be addressed by that effort! Thanks for this ๐Ÿ™

makkes commented 1 year ago

I agree with all that you're saying. Unfortunately it's a little more complicated and I'd love to strip the chart of some of its complexity soon. I'll keep that on my radar and create a follow-up issue for it this week.

When you set adminUser.create: false then this does not necessarily mean "I want to completely disable user/password authentication" because it can also mean "I'm managing the cluster-user-auth Secret myself". The bool flag doesn't give you the chance to specify your expectation around that, though, so as it is implemented now the assumption is it means the latter.

I would like to avoid adding another boolean flag for disable username/password authentication and rather make the whole authentication configuration much more comprehensible.

You can get a glimpse of how complex the chart has gotten in this PR that merely makes the Secret name configurable. ๐Ÿ˜‰

Callisto13 commented 1 year ago

cc #3389