mitodl / ol-infrastructure

Infrastructure automation code for use by MIT Open Learning
BSD 3-Clause "New" or "Revised" License
46 stars 4 forks source link

Manage Keycloak configuration objects via Pulumi #1332

Closed blarghmatey closed 1 year ago

blarghmatey commented 1 year ago

User Story

As a platform engineer I want to manage the configuration of our Keycloak service as code

Description/Context

Keycloak has an available Pulumi plugin for managing the different configuration objects (e.g. realms, integrations, etc.). We want to use this to keep our different Keycloak installations aligned with each other so that they don't suffer configuration drift through overuse of "click-ops".

Acceptance Criteria

Functional Requirements

shaidar commented 1 year ago

The keycloack configs are in this gist https://gist.github.com/collinpreston/b13af969bd90ff166df7f4624678a069. Some of the settings are loose given that it's a POC and will need to be tweaked prior to rolling out to QA and Prod. Started going through those configs and finding them in Pulumi. As always, the docs (keycloak) require some fishing around to find out how they can be configured.

shaidar commented 1 year ago

Setup a local environment with Keycloak and Vault to test out auth in Vault using Keycloak. Added a new Realm separate from the master realm that is created as part of the Keycloak deployment. Created a few users with different permission in vault and I was able to login to vault using Keycloak. Ran into a minor issue regarding the keycloak pop-up that was blocked by the browser but didn't see a warning or anything hinting to that, so might look at some options to see about that, but not a major issue for now. Today, will be setting up another realm and testing different permissions and then writing a Pulumi PR for a basic Keycloak config with multiple realms.

shaidar commented 1 year ago

Just thinking out loud a bit. For the realm configs, we’re most likely gonna end up with different settings for each realm, so think what i’m gonna do is instead of just looping over creating realms, i’ll have individual realms created with their own settings. If down the line we realize the settings are the same, we can consolidate, but my sense is that the realms are gonna have a good amount of differences. So for now, will start with a skeleton PR for the Platform Engineering realm and then I can translate most of what we have for the App realm into Pulumi code.

blarghmatey commented 1 year ago

:+1: sounds good to me. Maybe have it as its own Pulumi project then? Anything that looks like shared or shareable logic we can move into a lib helper for implementing additional realm projects.

shaidar commented 1 year ago

Think it's a bit too early in the process but agree in principle. Ideally, we should have a baseline standard shared across realms and then a few different tweaks on the individual realms, but will take a bit to establish that once we've had more experience with the product.

shaidar commented 1 year ago

Submitted a work-in-progress PR for the work I did yesterday for the realm that would be used by platform engineering services - https://github.com/mitodl/ol-infrastructure/pull/1411. Spent some time trying to figure out how to add multiple password policies and configure them as the example wasn't very clear but it appears to be just a matter of passing multiple strings in the same key that then gets parsed. Today need to work on:

shaidar commented 1 year ago

Went down a bit of a rabbit hole trying to figure out the password_policy key and its allowed/accepted values. For example, the keycloak server has an password policy in the UI Not Recently Used. If I want to use that value in password_policy, should it be notRecentlyUsed(3)? I looked through the keycloak docs searching for anything under PasswordPolicy but didn’t come across that value. I was also wondering if i were to use the specialChars value, how i’d go about passing its attributes inside the password_policy string itself. Another similar rabbit hole was the browser_flow and enabled OTP by default. So basically just working through translating keycloak configs into pulumi code.

shaidar commented 1 year ago

Finished adding the remaining pieces to setup an initial realm that can be used as a POC for DevOps services and then tried deploying a new QA Keycloak stack in order not to override any work on the CI stack that the devs had worked on. Most of the resources deployed but ran into an issue with getting RDS credentials to configure in the keycloak app for it to talk to the db. Tried launching the stack a few times in case it was a temporary issue, but the issue persisted. Reached out to Mike since he had previously worked on setting up the CI stack and wrote the Pulumi code and he mentioned something about having to first created the approle manually first before the app will work. Will need to followup with him on Monday/Tuesday to get a better understanding about the step I'm missing. Once I have the QA stack up and running, I can test out my realm configs and see what needs to be tweaked and test some of the config.

shaidar commented 1 year ago

Working on setting up Github as the Identity Provider with Keycloak being the identity broker. Created the OAuth on the Github side for QA and worked on the Pulumi keycloak config values. There's a Google oidc Pulumi provider, but not one for Github and spent some time looking into figuring out what the extra_config values need to be (if any). Will see about trying things with our current local setup to try out before codifying it in Pulumi and testing it on QA.

shaidar commented 1 year ago

On Wednesday, spent time populating different secrets required by the keycloak config to setup Github as an IDP. Ran into some issues on local machine with poetry/pyenv/python. Poetry was using python 3.10 but we've upgraded to using python 3.11 and after installing the new python version and setting it as default through the pyenv shim, poetry was still not using the new version and kept running into issues trying to run pulumi.

shaidar commented 1 year ago

Ran into some issues on my local machine with poetry/pyenv/python 3.11 and once I got those resolved, I was trying to test the Pulumi changes in my branch on Vault QA. Ran into an issue with the auth from Pulumi to keycloak so that pulumi can connect and actually deploy the desired changes. When I run pulumi up on the keycloak substructure am getting the following error:

keycloak:index:Realm (ol-platform-engineering):
    error: could not validate provider configuration: 5 errors occurred:
        * Invalid or unknown key
        * Invalid or unknown key
        * Invalid or unknown key
        * Invalid or unknown key
        * Invalid or unknown key

Tried commenting out some things just to try and figure out where the error is coming from. Am assuming that it's the auth to keycloak. So am gonna try a basic config to at least make sure the auth to keycloak is working and then I can test the rest of the code.

shaidar commented 1 year ago

Tested out auth to verify that the client_id and secret that I was using were correct. Made API calls to a few different endpoints and was able to get results back. It looked like it just wasn't reading the credentials from the stack properly for some reason. Chatted with Tobias and he recommended setting the provider and using that throughout the code if it works to get around the issue. Tested it out and it seems to have working with pulumi communicating with the Keycloak instance. Tested some of the code and was working through some what appeared minor issues towards the end of the day.

shaidar commented 1 year ago

Spent time looking into figuring out how to setup roles in keycloak and restrict access to a specific group/team from Github. Also was trying to test out some of the password policies and totp. Was concerned that given there are no group restrictions that I need to do most of the testing locally in order not to inadvertently expose vault qa secrets.