vexxhost / atmosphere

Simple & easy private cloud platform featuring VMs, Kubernetes & bare-metal
101 stars 28 forks source link

feature request: automatic secret rotation #521

Open fitbeard opened 1 year ago

fitbeard commented 1 year ago

Right now oslo.config has driver(enabled by default) which allows reading configuration from environment variables: https://opendev.org/openstack/oslo.config/commit/ea8a0f6a8b260474151fb27c2adc9dcc88774850 https://specs.openstack.org/openstack/oslo-specs/specs/rocky/config-from-environment.html https://docs.openstack.org/oslo.config/latest/reference/drivers.html The idea is to use HashiCorp Vault agent (sidecar) for all containers + https://github.com/vexxhost/vault-plugin-secrets-openstack + other Vault secret engines to rotate all secrets via Vault.

lukasmrtvy commented 1 year ago

Another use case for Vault related to the secrets would be as a Barbican backend: https://docs.openstack.org/barbican/latest/install/barbican-backend.html#vault-plugin

ricolin commented 1 year ago

So right now what make Vault a weird case is it's under Business Source License 1.1 (https://github.com/hashicorp/vault/blob/main/LICENSE#L20) so might not suit production uses and a bit strange to add to Atmosphere.

lukasmrtvy commented 1 year ago

https://news.ycombinator.com/item?id=37089944

mnaser commented 1 year ago

I spoke with Hashicorp and they are OK with us using Terraform in Atmosphere, but I will need to bring up Vault.

fitbeard commented 8 months ago

Hi. I made huge progress on this issue: https://github.com/fitbeard/atmosphere/compare/vault_poc_pre_timestamp...vault_poc. Before pushing I need to figure out how to make changes in helm-toolkit and OS charts to add custom annotations to secrets and how to skip values like region, username, password for service and MQ/DB connection strings.

fitbeard commented 8 months ago

Here is example which can be used for init/db-sync-like containers:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tadas-test
  namespace: openstack
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: vault
  template:
    metadata:
      labels:
        app.kubernetes.io/name: vault
      annotations:
        vault.security.banzaicloud.io/vault-addr: "https://vault.vault:8200"
        vault.security.banzaicloud.io/vault-role: "vault"
        vault.security.banzaicloud.io/vault-tls-secret: vault-tls
        vault.security.banzaicloud.io/vault-path: "kubernetes"
    spec:
      serviceAccountName: default
      containers:
      - name: alpine
        image: alpine
        command: ["sh", "-c", "echo $AWS_SECRET_ACCESS_KEY - $CRED_SECRET && echo going to sleep... && sleep 180"]
        env:
        - name: AWS_SECRET_ACCESS_KEY
          value: vault:secret/data/demosecret/aws#AWS_SECRET_ACCESS_KEY
        - name: CRED_SECRET
          value: vault:openstack/creds/vault#application_credential_secret

For this custom annotations are needed.

fitbeard commented 8 months ago

To utilize oslo.config environment driver we need something like: https://bank-vaults.dev/docs/mutating-webhook/vault-agent-templating/ or https://bank-vaults.dev/docs/mutating-webhook/consul-template/ - a mechanism which can HUP proccess + re-read db-creds/os-app-creds from env TO use THIS: https://github.com/fitbeard/atmosphere/compare/vault_poc_pre_timestamp...vault_poc#diff-4c7dc20d92f312a2206ef00799d4cbbfa672faa124af97d882a80501b6f312f9R113

fitbeard commented 8 months ago

I am asking you to help me or at least advise me :)

fitbeard commented 8 months ago

https://review.opendev.org/c/openstack/openstack-helm/+/911090 -> https://review.opendev.org/c/openstack/openstack-helm/+/912022

fitbeard commented 6 months ago

Now I'm waiting for this change https://review.opendev.org/c/openstack/openstack-helm/+/916641 to be merged. We already have contributions from other contributors related to annotations for pods and jobs. This will unblock Vault integration tasks for a while.

gtirloni commented 6 months ago

So right now what make Vault a weird case is it's under Business Source License 1.1 (https://github.com/hashicorp/vault/blob/main/LICENSE#L20) so might not suit production uses and a bit strange to add to Atmosphere.

IANAL but I think that, to violate the license requirements, Atmosphere would have to ship with Vault and that would then have to be used by whoever is deploying Atmosphere to provide Vault services to customers and compete with things like Hashicorp Cloud's Vault. In the worse case, it would be up to operators to decide what they are going to do with it.

We may want to consider if integrating OpenBao instead would be okay.

mnaser commented 6 months ago

@fitbeard Thanks on your progress here, so I think at this point we have a few components that can be managed directly by Vault secrets:

  1. Database authentication: https://developer.hashicorp.com/vault/docs/secrets/databases/mysql-maria
  2. RabbitMQ authentication: https://developer.hashicorp.com/vault/docs/secrets/rabbitmq
  3. OpenStack authentication: https://github.com/vexxhost/vault-plugin-secrets-openstack

Now, I think deploying Vault, enabling those secret engines is the easy part. The difficult part is actually relying on those values inside Atmosphere. OpenStack services largely don't really support value being reloaded in runtime (or I don't think they'll actually do it very gracefully).

This leaves us with two choices:

With #2, we'd have to add a lot more resiliency, but the win that we would get out of the box is that means if we are confident at 'killing' pods, then we can start more effectively using autoscaling for stateless services (API services, etc)

I am worried about stateful ones like L3 agents, those can take a long time to spin back up if there is a lot of routers and can cause an actual interruption...

Curious about thoughts here?

fitbeard commented 6 months ago

@mnaser I will start testing granully and only with one OS service. Let it be Glance. First I'm planning to configure bootstrap/dbsync/init stuff (which are using secrets only during a moment of execution) and fill dynamic data like RABBITMQ_CONNECTION or DB_CONNECTION with values from Vault using overrides: ${vault:rabbitmq-glance/creds/role#username}:${vault:rabbitmq-glance/creds/role#password} using https://bank-vaults.dev/docs/mutating-webhook/configuration/

fitbeard commented 6 months ago

About service reloading I still don't have answer but hoping that somehow this can be used: https://bank-vaults.dev/docs/mutating-webhook/vault-agent-templating/#use-vault-ttls It goes without saying that it needs improvement on Chart side.

fitbeard commented 6 months ago
fitbeard commented 6 months ago

And here is Vault agent template (with Ansible jinja compat) compatible with oslo.config environment driver for "sourcing":

{% raw %}{{- with secret "rabbitmq/creds/admin" -}}
{% endraw %}
OS_DEFAULT__TRANSPORT_URL="rabbit://{% raw %}{{ .Data.username }}:{{ .Data.password }}{% endraw %}@{{
rabbit_hosts | join(":5671," + "{{ .Data.username }}:{{ .Data.password }}" + "@") }}:5671"
{% raw %}{{ end }}
{% endraw %}
{% raw %}{{- with secret "openstack-test/creds/admin" -}}
{% endraw %}
OS_KEYSTONE_AUTHTOKEN__APPLICATION_CREDENTIAL_ID={% raw %}"{{ .Data.application_credential_id }}"
{% endraw %}
OS_KEYSTONE_AUTHTOKEN__APPLICATION_CREDENTIAL_SECRET={% raw %}"{{ .Data.application_credential_secret }}"
{% endraw %}
{% raw %}{{ end }}
{% endraw %}
{% raw %}{{- with secret "db/creds/admin" -}}
{% endraw %}
OS_DATABASE__CONNECTION="mysql+pymysql://{% raw %}{{ .Data.username }}:{{ .Data.password }}{% endraw %}@{{
db_host }}/glance?charset=utf8&ssl_ca=/etc/ssl/certs/{{ xxx }}/ca.pem"
{% raw %}{{- end -}}
{% endraw %}