zdt-upgrade - deployment should not have initContainer on upgrade job #641

Open charoensri opened 10 months ago

charoensri commented 10 months ago

Describe the bug Should the deployment yaml has a dependency on initContainer - upgrade job at end of the upgrad-deploy action?

I tried upgradeType: "zero-downtime" on docker desktop and crashed before post upgrade job. I noticed that the deployment spec.template with the upgrade-deploy action has the initContainer generated initContainers:

To Reproduce

Expected behavior

Source: pega/templates/pega-tier-deployment.yaml

should the initContainers with zdt upgrade job remained in Deployment spec after the upgrade? or because my docker desktop crashed just before the the post upgrade job started. NOTE: both pre and zdt upgrade completed. DB was upgraded successfully and the NEW replicaset recycled the pods without issues. It is OK from the application and upgrade perspective. However the deployment spec (so the pod spec) now has the initContainer in it. Once I deleted the upgrade job, a new pod will fail to start due to the intiContainer failure. I fixed this up by having another helm upgrade using the deploy action only with the new rules schema.

Chart version I clone and use this chart locally

apiVersion: v1 name: pega version: "1.2.0" description: Pega installation on kubernetes keywords:

Server (if applicable, please complete the following information): postgreSQL, docker desktop

Additional context

Source: pega/templates/pega-tier-deployment.yaml

kind: Deployment apiVersion: apps/v1 metadata: annotations: name: pega-dockerdesktop-web namespace: pega883 labels: app: pega-dockerdesktop-web component: Pega spec:

Replicas specify the number of copies for pega-dockerdesktop-web

replicas: 1 progressDeadlineSeconds: 2147483647 selector: matchLabels: app: pega-dockerdesktop-web strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 0 type: RollingUpdate template: metadata: labels: app: pega-dockerdesktop-web annotations: config-check: 41181778004bd56b9c2cf77c7d9e9bdec0eb73e9e5980e0d95159dc69621efac config-tier-check: 7060cc4a89b2696a22ccca2b06eb060f204cb55ff63a90469297eeffec62c403 certificate-check: 2cb1f675c5f532bd68c3851872bf42719f0516208049d403a84068dac54c695c


  # Volume used to mount config files.
  - name: pega-volume-config
      # This name will be referred in the volume mounts kind.
      name: pega-dockerdesktop-web
      # Used to specify permissions on files within the volume.
      defaultMode: 420      
  - name: pega-volume-credentials
      defaultMode: 420
      - secret:
          name: seri-pega-secrets    
      - secret:
          name: pega-hz-secret    
      - secret:
          name: pega-stream-secret    
      - secret:
          name: pega-dds-secret    

      - secret:
          name: pega-diagnostic-secret

  - name: wait-for-pegaupgrade
    image: pegasystems/k8s-wait-for
    imagePullPolicy: IfNotPresent
    args: [ 'job', 'pega-zdt-upgrade']
    - name: WAIT_TIME
      value: "2"
    - name: MAX_RETRIES
      value: "1"
      # Resources requests/limits for initContainers
        cpu: 50m
        memory: 64Mi
        cpu: 50m
        memory: 64Mi
    runAsUser: 9001
    fsGroup: 0
  # Name of the container
  - name: pega-web-tomcat
    # The pega image, you may use the official pega distribution or you may extend
    # and host it yourself.  See the image documentation for more information.
    image: charoensri1seri1/pega:8.8.3
    # Pod (app instance) listens on this port
    - containerPort: 8080
      name: pega-web-port
    - containerPort: 8443
      name: pega-tls-port
    # Specify any of the container environment variables here
    # Node type of the Pega nodes for pega-dockerdesktop-web
    - name: NODE_TYPE
      value: WebUser
      value: prweb
      value: "900"
    # Additional JVM arguments
    - name: JAVA_OPTS
      value: ""
    # Additional CATALINA arguments
    - name: CATALINA_OPTS
      value: "-XX:InitialCodeCacheSize=256M -XX:ReservedCodeCacheSize=512M -XX:MetaspaceSize=784m -XX:MaxMetaspaceSize=1G -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:InitiatingHeapOccupancyPercent=75 -XX:MaxGCPauseMillis=300 -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc*,gc+ref=debug,gc+heap=debug,gc+age=trace:file=/usr/local/tomcat/logs/gc-%p-%t.log:tags,uptime,time,level:filecount=10,filesize=50m"
    # Initial JVM heap size, equivalent to -Xms
    - name: INITIAL_HEAP
      value: "4096m"
    # Maximum JVM heap size, equivalent to -Xmx
    - name: MAX_HEAP
      value: "8192m"
    # Tier of the Pega node
    - name: NODE_TIER
      value: dockerdesktop-web
    - name: RETRY_TIMEOUT
      value: "30"
    - name: MAX_RETRIES
      value: "4"
    - configMapRef:
        name: pega-environment-config
      # Maximum CPU and Memory that the containers for pega-dockerdesktop-web can use
        cpu: "3"
        memory: "14Gi"
      # CPU and Memory that the containers for pega-dockerdesktop-web request
        cpu: "200m"
        memory: "2Gi"
    # The given mountpath is mapped to volume with the specified name.  The config map files are mounted here.
    - name: pega-volume-config
      mountPath: "/opt/pega/config"
    - name: pega-volume-credentials
      mountPath: "/opt/pega/secrets"
    #mount custom certificates

    # LivenessProbe: indicates whether the container is live, i.e. running.
        path: "/prweb/PRRestService/monitor/pingService/ping"
        port: 8081
        scheme: HTTP
      initialDelaySeconds: 0
      timeoutSeconds: 20
      periodSeconds: 30
      successThreshold: 1
      failureThreshold: 3
    # ReadinessProbe: indicates whether the container is ready to service requests.
        path: "/prweb/PRRestService/monitor/pingService/ping"
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 0
      timeoutSeconds: 10
      periodSeconds: 10
      successThreshold: 1
      failureThreshold: 3
    # StartupProbe: indicates whether the container has completed its startup process, and delays the LivenessProbe
        path: "/prweb/PRRestService/monitor/pingService/ping"
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 10
      timeoutSeconds: 10
      periodSeconds: 10
      successThreshold: 1
      failureThreshold: 30
  # Mentions the restart policy to be followed by the pod.  'Always' means that a new pod will always be created irrespective of type of the failure.
  restartPolicy: Always
  # Amount of time in which container has to gracefully shutdown.
  terminationGracePeriodSeconds: 300
  # Secret which is used to pull the image from the repository.  This secret contains docker login details for the particular user.
  # If the image is in a protected registry, you must specify a secret to access it.
  - name: pega-registry-secret
pega-roska commented 10 months ago

Right now it is required to change the chart action to deploy to remove the init containers if something goes wrong and the jobs go away. We will investigate in the future if there are potential enhancements we could make to recover more automatically.