syndesisio / syndesis

This project is archived. A flexible, customizable, open source platform that provides core integration capabilities as a service.
https://syndesis.io/
Apache License 2.0
597 stars 203 forks source link

Unable to upgrade using operator - upgrade pod - pod has unbound PersistentVolumeClaims #4010

Closed avano closed 5 years ago

avano commented 6 years ago

This is a...


[ ] Feature request
[ ] Regression (a behavior that used to work and stopped working in a new release)
[X] Bug report  
[ ] Documentation issue or request

Description

I tried to upgrade from "1.4.9" to "latest" (because currently there is still hardcoded "latest" upgrade pod in operator image #3924). Then I tried from "1.5.5" to "latest" with the same result - upgrade pod won't start because of this: up1

There is also this error in the operator log:

time="2018-11-01T11:58:12Z" level=error msg="error syncing key (myproject/app): object is being deleted: persistentvolumeclaims \"syndesis-upgrade\" already exists"

steps to reproduce: ./syndesis install -s ./syndesis install --project myproject --tag 1.4.9 wait for syndesis to install deploy new operator image, for example 1.5.6 version

minishift v1.26.1+1e20f27 openshift 3.10

heiko-braun commented 6 years ago

@nicolaferraro @rhuss Can you two take a look at this?

nicolaferraro commented 6 years ago

@avano I'm not able to reproduce it.

Here's the list of steps I'm doing:

syndesis install --project myproject --tag 1.4.9
# wait for everything to install
syndesis install --operator-only --tag 1.5.6

The result is what expected. The error about the persistent volume claim already existing is not a real issue. It will not appear anymore after #3861.

Operator logs:

  | time="2018-11-06T00:04:51Z" level=info msg="Go Version: go1.10.3"
  | time="2018-11-06T00:04:51Z" level=info msg="Go OS/Arch: linux/amd64"
  | time="2018-11-06T00:04:51Z" level=info msg="operator-sdk Version: 0.0.5+git"
  | time="2018-11-06T00:04:51Z" level=info msg="Using template /conf/syndesis-template.yml"
  | time="2018-11-06T00:04:51Z" level=info msg="Watching syndesis.io/v1alpha1, Syndesis, myproject, 5"
  | time="2018-11-06T00:04:51Z" level=info msg="No legacy Syndesis installations detected in the myproject namespace"
  | time="2018-11-06T00:04:51Z" level=info msg="Syndesis legacy installations check completed"
  | time="2018-11-06T00:04:51Z" level=info msg="Starting upgrade of Syndesis resource app from version 1.4.9 to version 1.5.6"
  | time="2018-11-06T00:04:51Z" level=info msg="Upgrading syndesis resource app from version 1.4.9 to 1.5.6"
  | time="2018-11-06T00:04:51Z" level=error msg="error syncing key (myproject/app): object is being deleted: persistentvolumeclaims \"syndesis-upgrade\" already exists"
  | time="2018-11-06T00:04:52Z" level=info msg="Upgrading syndesis resource app from version 1.4.9 to 1.5.6"
  | time="2018-11-06T00:04:56Z" level=info msg="Syndesis resource app is currently being upgraded to version 1.5.6"
  | time="2018-11-06T00:05:01Z" level=info msg="Syndesis resource app is currently being upgraded to version 1.5.6"
  | time="2018-11-06T00:05:06Z" level=info msg="Syndesis resource app is currently being upgraded to version 1.5.6"
  | time="2018-11-06T00:05:11Z" level=info msg="Syndesis resource app is currently being upgraded to version 1.5.6"

The upgrade pod is bound and completes successfully.

The redeployment does not stop because #3924 is not merged and tag is hardcoded to latest, but I don't see the error you're reporting here.

Are you doing something different?

avano commented 6 years ago

@nicolaferraro strange, I tried again with the same steps as you and still can see the same outcome. I'll ask around if I'm the only one with that problem and will let you know

tplevko commented 6 years ago

@nicolaferraro I was also able to reproduce the issue @avano describes. Which openshift version do you use please? We both hit this issue on 3.10.

nicolaferraro commented 6 years ago

This is my env:

minishift v1.25.0+90fb23e
openshift v3.10.0+349c70c-73
kubernetes v1.10.0+b81c8f8

Can you share the yaml of the upgrade pod to understand why it's not mounting the volume?

nicolaferraro commented 6 years ago

There are a number of issues on OpenShift related to subpath mounting:

...
volumeMounts:
- mountPath: /opt/backup
  subPath: backup
  name: backup-dir

It may be a OpenShift issue.

avano commented 6 years ago
apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: restricted
  creationTimestamp: '2018-11-06T08:20:13Z'
  name: syndesis-upgrade-latest
  namespace: myproject
  ownerReferences:
    - apiVersion: syndesis.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: Syndesis
      name: app
      uid: 669ea73b-e19c-11e8-9735-080027629ecd
  resourceVersion: '10084'
  selfLink: /api/v1/namespaces/myproject/pods/syndesis-upgrade-latest
  uid: c90085a9-e19c-11e8-9735-080027629ecd
spec:
  containers:
    - args:
        - '--backup'
        - /opt/backup
      env:
        - name: SYNDESIS_UPGRADE_PROJECT
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
      image: 'docker.io/syndesis/syndesis-upgrade:latest'
      imagePullPolicy: IfNotPresent
      name: upgrade
      resources: {}
      securityContext:
        capabilities:
          drop:
            - KILL
            - MKNOD
            - SETGID
            - SETUID
        runAsUser: 1000140000
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /opt/backup
          name: backup-dir
          subPath: backup
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: syndesis-operator-token-mzww7
          readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
    - name: syndesis-operator-dockercfg-k69hn
  nodeName: localhost
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000140000
    seLinuxOptions:
      level: 's0:c12,c4'
  serviceAccount: syndesis-operator
  serviceAccountName: syndesis-operator
  terminationGracePeriodSeconds: 30
  volumes:
    - name: backup-dir
      persistentVolumeClaim:
        claimName: syndesis-upgrade
    - name: syndesis-operator-token-mzww7
      secret:
        defaultMode: 420
        secretName: syndesis-operator-token-mzww7
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2018-11-06T08:20:14Z'
      status: 'True'
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: '2018-11-06T08:20:14Z'
      message: 'containers with unready status: [upgrade]'
      reason: ContainersNotReady
      status: 'False'
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: '2018-11-06T08:20:14Z'
      status: 'True'
      type: PodScheduled
  containerStatuses:
    - image: 'docker.io/syndesis/syndesis-upgrade:latest'
      imageID: ''
      lastState: {}
      name: upgrade
      ready: false
      restartCount: 0
      state:
        waiting:
          message: >-
            failed to prepare subPath for volumeMount "backup-dir" of container
            "upgrade"
          reason: CreateContainerConfigError
  hostIP: 10.0.2.15
  phase: Pending
  podIP: 172.17.0.3
  qosClass: BestEffort
  startTime: '2018-11-06T08:20:14Z'
nicolaferraro commented 6 years ago

It seems so.

This should be the first time we use the upgrade volume. To remove any issue related to OpenShift versions, I think we can safely remove the subPath property. It was needed when the volume was shared with the database, but it does not make sense now.

heiko-braun commented 6 years ago

@avano @nicolaferraro What oc version are you using? Could it be related?

nicolaferraro commented 6 years ago

No, I don't think the oc version is related because the resources that are having the problem are not created by oc directly, they are created by the operator.

The issue in the event monitor seems related to the subPath mounting, that is a feature that we don't actually need to use since we have a dedicated volume for the upgrade now.

avano commented 6 years ago

btw I tried to upgrade from 1.4.9 to "master" (latest with your changes) and it works well now.

heiko-braun commented 6 years ago

@avano if it works well, can you close the issue?

avano commented 6 years ago

@heiko-braun I would prefer to leave this open until it gets to a prod build, because the CR1 is affected as well at the moment

heiko-braun commented 6 years ago

@avano Yes, makes sense.

avano commented 5 years ago

fixed in CR2 prod build