uselagoon / build-deploy-tool

Tool to generate build resources
4 stars 7 forks source link

Side-effect / Breaking Change for persistent volumes in additional volumes change #382

Open sjerdo opened 6 days ago

sjerdo commented 6 days ago

It seems like the change in #339 brought a breaking change in the deployment templating for basic-persistent services of which the volume name did not equal the service name.

For example, one of our projects has the following service defined in docker-compose.yml: Note the difference in service name and value of lagoon.persistent.name

services:
  rabbitmq:
    image: uselagoon/rabbitmq
    labels:
      lagoon.service.port: 5672
      lagoon.service.usecomposeports: true
      lagoon.autogeneratedroute: false
      lagoon.type: basic-persistent
      lagoon.persistent: /var/lib/rabbitmq
      lagoon.persistent.name: rabbitmq-data
      # Note: 5Gi is the default size for persistent storage
      lagoon.persistent.size: 1Gi
    environment:
      <<: *default-environment
    ports:
      - '5672'
      - '15672:15672'

Previously, this generated the following pvc config:

# Source: basic-persistent/templates/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rabbitmq-data
  labels:
    helm.sh/chart: basic-persistent-0.1.0
    app.kubernetes.io/name: basic-persistent
    app.kubernetes.io/instance: rabbitmq
    app.kubernetes.io/managed-by: Helm
    lagoon.sh/service: rabbitmq
    lagoon.sh/service-type: basic-persistent
    lagoon.sh/project: project
    lagoon.sh/environment: production
    lagoon.sh/environmentType: production
    lagoon.sh/buildType: promote
  annotations:
    k8up.syn.tools/backup: "true"
    lagoon.sh/version: "24.5.1"
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: bulk
  resources:
    requests:
      storage: "1Gi"

However, in recent builds this pvc is not being generated any longer. This causes a failure when deploying a new environment in this project, since the non-existent pvc cannot be mount to the service.

The build-deploy-tool should probably include some validation to check if the volumes specified in label lagoon.persistent.name exists, before rolling out

To fix our setup for current environments, we can't just switch to service type basic, since its not allowed to change service types for existing services. Also, we can't just change the persistent volume name to rabbitmq, since this can cause a loss of data.

A workaround in the current situation, would be moving the rabbitmq-data volume to the additional volumes section and adding an additional volume, matching the services name, which we won't use. For example:

services:
  rabbitmq:
    image: uselagoon/rabbitmq
    labels:
      lagoon.service.port: 5672
      lagoon.service.usecomposeports: true
      lagoon.autogeneratedroute: false
      lagoon.type: basic-persistent
      lagoon.persistent: /tmp/not-used
      lagoon.persistent.name: rabbitmq
      # Note: 5Gi is the default size for persistent storage
      lagoon.persistent.size: 10Mi
      lagoon.volumes.rabbitmq-data.path: /var/lib/rabbitmq
    environment:
      <<: *default-environment
    ports:
      - '5672'
      - '15672:15672'
    volumes:
      - 'rabbitmq-data:/var/lib/rabbitmq'

volumes:
  rabbitmq-data:
    labels:
      lagoon.type: persistent
      # Note: 5Gi is the default size for persistent storage
      lagoon.persistent.size: 1Gi
      lagoon.backup: false
shreddedbacon commented 6 days ago

I'm not sure if the workaround you proposed will work, additional volumes are created differently to the default volumes offered by a service. So it won't be the same data volume previously created.

sjerdo commented 3 days ago

This was the original helm template for the service:

# Source: basic-persistent/templates/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rabbitmq-data
  labels:
    helm.sh/chart: basic-persistent-0.1.0
    app.kubernetes.io/name: basic-persistent
    app.kubernetes.io/instance: rabbitmq
    app.kubernetes.io/managed-by: Helm
    lagoon.sh/service: rabbitmq
    lagoon.sh/service-type: basic-persistent
    lagoon.sh/project: project
    lagoon.sh/environment: production
    lagoon.sh/environmentType: production
    lagoon.sh/buildType: promote
  annotations:
    k8up.syn.tools/backup: "true"
    lagoon.sh/version: "24.5.1"
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: bulk
  resources:
    requests:
      storage: "1Gi"

The workaround produces the following template:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    k8up.io/backup: "false"
    k8up.syn.tools/backup: "false"
    lagoon.sh/branch: feature/ZBSALG-6097-cloudflare
    lagoon.sh/version: v2.21.0
  creationTimestamp: null
  labels:
    app.kubernetes.io/instance: custom-rabbitmq-data
    app.kubernetes.io/managed-by: build-deploy-tool
    app.kubernetes.io/name: rabbitmq-data
    lagoon.sh/buildType: branch
    lagoon.sh/environment: develop
    lagoon.sh/environmentType: development
    lagoon.sh/project: project
    lagoon.sh/service-type: additional-volume
    lagoon.sh/template: additional-volume-0.1.0
  name: custom-rabbitmq-data
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: bulk
status: {}

and an additional volume which we won't be using:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    k8up.io/backup: "true"
    k8up.syn.tools/backup: "true"
    lagoon.sh/branch: feature/ZBSALG-6097-cloudflare
    lagoon.sh/version: v2.21.0
  creationTimestamp: null
  labels:
    app.kubernetes.io/instance: rabbitmq
    app.kubernetes.io/managed-by: build-deploy-tool
    app.kubernetes.io/name: basic-persistent
    lagoon.sh/buildType: branch
    lagoon.sh/environment: feature-zbsalg-6097-cloudflare
    lagoon.sh/environmentType: development
    lagoon.sh/project: project
    lagoon.sh/service: rabbitmq
    lagoon.sh/service-type: basic-persistent
    lagoon.sh/template: basic-persistent-0.1.0
  name: rabbitmq
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Mi
  storageClassName: bulk
status: {}
shreddedbacon commented 3 days ago

Look at it a bit closer, you'll see the one in your workaround is named differently, see custom-rabbitmq-data vs rabbitmq-data

If you deploy this in an existing production environment, you'll have 3 volumes for your rabbitmq then.

I've got a fix in a pull request pending that might have a test case that looks familiar

sjerdo commented 2 days ago

You're right, that workaround works for spinning up new environments, but with existing environments we end up with 3 volumes.

Some other workaround would be to introduce a new noop basic-persistent service named rabbitmq-data. But it would be better if that pull requests is merged main soon