uselagoon / lagoon

Lagoon, the developer-focused application delivery platform
https://docs.lagoon.sh/
Apache License 2.0
566 stars 148 forks source link

Allow drush sql-sync from production during post-rollout tasks #1267

Closed Schnitzel closed 2 years ago

Schnitzel commented 5 years ago

With Lagoon 1.0.0 and RBAC all post-rollout tasks and also all remote connections tasks are running under a user that has guest Role. This guest does not have permissions to use the remote shell and therefore running drush sql-sync within a post-rollout task or inside a remote shell connection will fail.

This problem also existed for Lagoon Tasks, but we solved this with injecting a maintainer token into the Task Pod: https://github.com/amazeeio/lagoon/issues/1229

For problem here there are multiple possible solutions:

  1. give the remote shell connection a role that can use remote shell connections (like maintainer or create a new role all together). While this solves the underlining problem and has been tested already it would allow a user with no access to the production environment allow to elevate it's own priviledges with connecting into the development environment and "steal" the SSH key of the default user which has this additional. Therefore allowing developers without production environment access to access such enviornment.

  2. Abandoning running of drush sql-sync during a post-rollout task all together and instead write a small bash script which creates a lagoon task for running drush-sql sync inside that task and then within the bash script to wait until that lagoon task has been finished. This technically would work but brings it's own challenges:

    1. it would mean that all projects currently directly using drush sql-sync inside post-rollout will fail.
    2. there is currently no example for such a bash script that can create a Lagoon Task
    3. Debugging failures of the drush sql-sync will be much harder as the command runs within another pod and logs are a bit trickier to load.
    4. Lagoon Tasks require an environment to successfully be deployed. In some cases the environment we create the task in might not be fully deployed yet.
  3. Abandoning running of drush sql-sync during a post-rollout task all together and allow to trigger Lagoon Tasks via .lagoon.yml post rollout tasks. The Post Rollout Tasks yaml scheme is already prepared to run other type of tasks than run. We could there directly do something like:

    post-rollout:
     - task:
         name: sync db from master to local
         type: drush_sql-sync
         source: @master
         destination: @self

    Which then automatically creates a Lagoon Task, waits until that lagoon task is done and continues with the deployment. While this is probably much easier to use than running a BashScript in Solution 2, there are still downsides:

    1. it would mean that all projects currently directly using drush sql-sync inside post-rollout will fail.
    2. many projects check first if the database exists on the current environment and only if it does not exist, then it runs the sync command, so we probably would need to provide something like:
      post-rollout:
        - task:
            name: sync db from master to local
            type: drush_sql-sync
            source: @master
            destination: @self
            only_if: tables=$(drush sqlq 'show tables;') && [ -z "$tables" ]

      which runs the task only if the defined script does not fail

  4. during any remote shell connection we could inject the Lagoon JWT of the user that currently is connecting to the remote shell into the destination of the connection (like the 'cli' container). This would technically remove the need to inject an SSH key into the CLI container all together. There are only a couple of issues to solve:

    1. Admins accessing the CLI container directly via the OpenShift UI or via oc rsh would not have a JWT at all, as they are not connecting via the lagoon remote shell system and therefore no JWT token would be generated. (this probably speaks to the fact to keep the ssh key with the default-user).
    2. The Lagoon Build Process runs under the default-user and authenticates with this user against the Lagoon API. We would need to allow that user role to have permission to access the production environment.

I personally think Nr. 4 is the best solution we could do right now, as it makes the most sense to have the same JWT running inside a remote shell as the user that connects to it (basically mimiks the SSH Agent system).

Schnitzel commented 5 years ago

Possible solution:

Deployment post-rollout (drush sql-sync @production @self):

  1. Inject Maintainer Refresh Token into Lagoon Build Config
  2. Every post-rollout run (https://github.com/amazeeio/lagoon/blob/master/images/oc-build-deploy-dind/scripts/exec-tasks-run.sh) injects refresh token into exec: oc -n drupal-example-master exec cli-21-l4vqk -i -t -- sh -c "export LAGOON_REFRESH_TOKEN=TEEST7; sh"
  3. Drush boots inside the cli pod and runs existing bash script that converts LAGOON_REFRESH_TOKEN into LAGOON_TOKEN (replace current generation via SSH https://github.com/amazeeio/lagoon/blob/master/services/drush-alias/web/aliases.drushrc.php.stub#L178)
  4. Drush get list of all environments with LAGOON_TOKEN
  5. Drush does SSH connection to production environment. SSH picks up the LAGOON_TOKEN environment, authenticates at Remote Shell SSH via existing SSH Key (that only has guest role account) and passes LAGOON_TOKEN to remote shell system of Lagoon (via SSH environment variable forwarding SendEnv). Lagoon Remote Shell System uses provided LAGOON_TOKEN and not the one generated from the SSH public key to check if user (aka build system) has access to production enviornment.

Locally drush sql-sync with user having developer role:

  1. Drush checks for LAGOON_TOKEN, does not find one, does also not find LAGOON_REFRESH_TOKEN therefore falls back to create LAGOON_TOKEN via SSH (https://github.com/amazeeio/lagoon/blob/master/services/drush-alias/web/aliases.drushrc.php.stub#L178)
  2. Drush exports generated token as LAGOON_TOKEN environment variable.
  3. Drush get list of all environments with LAGOON_TOKEN
  4. Drush does SSH connection to production environment. SSH picks up the LAGOON_TOKEN environment, authenticates at Remote Shell SSH via existing SSH Key (that has permissions of the current user = developer role) and passes LAGOON_TOKEN to remote shell system of Lagoon (via SSH environment variable forwarding SendEnv). Lagoon Remote Shell System uses provided LAGOON_TOKEN and not the one generated from the SSH public key to check if user (aka build system) has access to production enviornment (which will be denied).

Lagoon Drush SQL-Sync Tasks:

  1. Inject maintainer LAGOON_REFRESH_TOKEN into task pod
  2. Drush boots inside the task pod and runs existing bash script that converts LAGOON_REFRESH_TOKEN into LAGOON_TOKEN (replace current generation via SSH https://github.com/amazeeio/lagoon/blob/master/services/drush-alias/web/aliases.drushrc.php.stub#L178)
  3. Drush get list of all environments with LAGOON_TOKEN
  4. Drush does SSH connection to production environment. SSH picks up the LAGOON_TOKEN environment, authenticates at Remote Shell SSH via existing SSH Key (that only has guest role account) and passes LAGOON_TOKEN to remote shell system of Lagoon (via SSH environment variable forwarding SendEnv). Lagoon Remote Shell System uses provided LAGOON_TOKEN and not the one generated from the SSH public key to check if user (aka build system) has access to production enviornment.

Lagoon SQL-Dump Tasks:

  1. Inject maintainer LAGOON_REFRESH_TOKEN into task pod
  2. After drush sql-dump has finished
  3. Task script uses Lagoon getLagoonAccessToken.sh to generate access token from LAGOON_REFRESH_TOKEN
  4. Task script uploads files into Lagoon API with LAGOON_TOKEN (this means we can remove the permission to upload files for guests)
Schnitzel commented 5 years ago
tobybellwood commented 2 years ago

This is now possible - all branches have a maintainer token, so can access other branches