janus-idp / operator

Deprecated - Operator for Backstage, based on the Operator SDK framework - see https://github.com/redhat-developer/rhdh-operator
https://github.com/redhat-developer/rhdh-operator
Apache License 2.0
15 stars 15 forks source link

`spec.serviceName` field in local DB StatefulSet operand is hardcoded and not correct #370

Closed rm3l closed 5 months ago

rm3l commented 5 months ago

/kind bug

Follow-up issue discovered while working on https://github.com/janus-idp/operator/pull/369

What versions of software are you using?

What did you run exactly?

kubectl get service --selector=app.kubernetes.io/name=backstage,app.kubernetes.io/instance=bs1


## Actual behavior

- Here are the services created by the operator:

$ kubectl get service --selector=app.kubernetes.io/name=backstage,app.kubernetes.io/instance=bs1
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE backstage-bs1 ClusterIP 172.31.39.19 80/TCP 6m36s backstage-db-bs1 ClusterIP 172.31.20.189 5432/TCP 6m36s


- And the spec of the DB StatefulSet, where we can see that `spec.serviceName` is hardcoded to `backstage-psql-cr1-hl` (and `spec.template.metadata.name` is also hardcoded to `backstage-db-cr1`, but this might not be an issue per se).

<details>
<summary>StatefulSet spec</summary>

```yaml
persistentVolumeClaimRetentionPolicy:
  whenDeleted: Retain
  whenScaled: Retain
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
  matchLabels:
    rhdh.redhat.com/app: backstage-db-bs1
serviceName: backstage-psql-cr1-hl
template:
  metadata:
    creationTimestamp: null
    labels:
      rhdh.redhat.com/app: backstage-db-bs1
    name: backstage-db-cr1
  spec:
    automountServiceAccountToken: false
    containers:
      - env:
          - name: POSTGRESQL_PORT_NUMBER
            value: "5432"
          - name: POSTGRESQL_VOLUME_DIR
            value: /var/lib/pgsql/data
          - name: PGDATA
            value: /var/lib/pgsql/data/userdata
        envFrom:
          - secretRef:
              name: backstage-db-bs1
        image: quay.io/fedora/postgresql-15:latest
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
              - /bin/sh
              - -c
              - exec pg_isready -U ${POSTGRES_USER} -h 127.0.0.1 -p 5432
          failureThreshold: 6
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: postgresql
        ports:
          - containerPort: 5432
            name: tcp-postgresql
            protocol: TCP
        readinessProbe:
          exec:
            command:
              - /bin/sh
              - -c
              - -e
              - |
                exec pg_isready -U ${POSTGRES_USER} -h 127.0.0.1 -p 5432
          failureThreshold: 6
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 250m
            ephemeral-storage: 20Mi
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 256Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
          runAsGroup: 0
          runAsNonRoot: true
          seccompProfile:
            type: RuntimeDefault
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
          - mountPath: /dev/shm
            name: dshm
          - mountPath: /var/lib/pgsql/data
            name: data
    dnsPolicy: ClusterFirst
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    serviceAccount: default
    serviceAccountName: default
    terminationGracePeriodSeconds: 30
    volumes:
      - emptyDir:
          medium: Memory
        name: dshm
updateStrategy:
  rollingUpdate:
    partition: 0
  type: RollingUpdate
volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      volumeMode: Filesystem
    status:
      phase: Pending

Expected behavior

Per the official K8s docs, a Headless Service is required for StatefulSets and must exist:

StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service

Not sure why K8s does not return an error with this non-existing service, but the expected behavior would be:

Any logs, error output, etc?

gazarenkov commented 5 months ago

Our StatefulSet has never worked with headless service which mean only 1 replica configuration would be working correct. Indeed, to make it work correct with more than one Pod-PV it has to be headless service instead of load-balancing one.

I would not qualify it as a bug though, it is rather a limitation for local (not recommended for production) database schema and, again, it has never worked with headless service yet.

spec.serviceName should not be hardcoded. Instead, it can have a value like backstage-psql--hl (and this >headless service should be created by the operator as well)

I do not see why we need "-hl" and "non-hl" service here. If it will work with headless, let's leave only it (backstage-psql- as a serviceName).

spec.template.metadata.name should not be hardcoded. Instead, it can have a value like backstage-psql-

Unfortunatelly, looks like Statefulsert does not respect it (otherwise, there were no needs in renaming SS). The template for Pod names are still - . We can just remove it