Old shinyproxy pods not deleted by the operator due to operator exceptions being thrown.

leynebe commented 6 months ago

I scoured the shinyproxy operator config docs and after not finding an option to terminate old shinyproxy pods I discovered the shinyproxy operator itself was throwing errors. 2 errors stand out to me from the following shinyproxy operator logs:

Caught an exception while processing event ShinyProxyEvent(eventType=CHECK_OBSOLETE_INSTANCES...: This explains why we have a huge list of old shinyproxy pods which are not being terminated.

12:25:02.137 [atcher-worker-1] WARN  eu.op.sh.co.ShinyProxyController     - Caught an exception while processing event ShinyProxyEvent(eventType=CHECK_OBSOLETE_INSTANCES, shinyProxy=null, shinyProxyInstance=null, retried=false). [Attempt 5/5] Not re-processing this event.

# Describing the operator, seeing all linked shinyproxy pods.
...
Hash Of Spec:        fbcc16f401d576da8b2b59094f6e07dc5723eefa
Is Latest Instance:  false
Hash Of Spec:        01d0beee33c612b92bc28560febac80eecff246c
Is Latest Instance:  false
Hash Of Spec:        c0eb2bfac0dd0c1cf426286af180c9d6a8efd294
Is Latest Instance:  false
Hash Of Spec:        fcbfd4f1d429657a7c4565984bab55506af9b06c
Is Latest Instance:  false
Hash Of Spec:        6db6fea3f15e7e20fcbdc7e29ff2ee47adde9a12
Is Latest Instance:  false
Hash Of Spec:        29dfbe853cec465bf90ecdb7eba12196b64381f7
Is Latest Instance:  false
Hash Of Spec:        43ead635d242be06cbb38cf4e4eaf99f00c9c621
Is Latest Instance:  false
Hash Of Spec:        afd6e70fbd4e59dabe9ee1a4f37fb0751532fdae
Is Latest Instance:  false
Hash Of Spec:        5f9ff7a8c138de919edf8b8ddc26c42fd4da2d65
Is Latest Instance:  false
Hash Of Spec:        4483e8889ff5dcab3a529bdc5520d69564b298ac
Is Latest Instance:  false
Hash Of Spec:        58c35bd8e50f6847fcdd02addc2980dc33f1be9b
Is Latest Instance:  false
Hash Of Spec:        31db954a008227c2e278373255935b8f9bb65de3
Is Latest Instance:  false
Hash Of Spec:        f2d0dc369c242d61a0f4ff71a1d40d94205554fd
Is Latest Instance:  false
Hash Of Spec:        ae0cbd081dcb643192a7e881e8e2491a435ec091
Is Latest Instance:  true

sp-shinyproxy-dev-rate-rs-01d0beee33c612b92bc28560febac80epr4xl   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-0ca11437d19750ea62d19140dee8a5524txlj   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-0ce152a1e5b1ae58847301446d3e552dxh64w   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-11e8c0c49732350a93c8f3d7ebdb5859tnjvr   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-1dd6de6442a299b3a63e9eb69618bec555scq   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-1eb397919124a3b1cfb58243474c5f45s479f   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-2291893b87cd2e51e647c60ea11c7855ct9vd   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-25637d13e8dccb8b3020e5e721fc1f105zz57   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-29dfbe853cec465bf90ecdb7eba12196lfjkg   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-2b54f4584de5bd5c2b8c9f84760f5cb9vslxx   1/1     Running   0          7d2h
...
sp-shinyproxy-dev-rate-rs-c0eb2bfac0dd0c1cf426286af180c9d6rnbc6   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-ca8fd4b2134bfe66ece29318aa93076bwsgdz   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-d61b2d614f50f15d6c81ea61ce272ccb7r9hb   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-e93de208664caa110095f15b84119a22ls5zb   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-f2d0dc369c242d61a0f4ff71a1d40d945czjf   1/1     Running   0          42m
sp-shinyproxy-dev-rate-rs-f8659de8fbaf3db67d8348368930e800x2c5d   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-fbcc16f401d576da8b2b59094f6e07dc772b8   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-fcbfd4f1d429657a7c4565984bab5550b4tvp   1/1     Running   0          7d2h
sp-shinyproxy-dev-rate-rs-fd1c132a1e8e2273d53a05e632c43a4cngldp   1/1     Running   0          7d2h

Unrecognized field "timestamp": Some health checks that aren't properly configured?

12:25:05.168 [atcher-worker-1] WARN  eu.op.sh.co.ShinyProxyController     - Caught an exception while processing event. [Attempt 3/5]
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "timestamp" (class eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker$Response), not marked as ignorable (2 known properties: "isRecyclable", "activeConnections"])
at [Source: (String)"{"timestamp":"2023-12-12T12:25:05.168+00:00","status":404,"error":"Not Found","path":"/actuator/recyclable"}"; line: 1, column: 109] (through reference chain: eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker$Response["timestamp"])
    at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperties(BeanDeserializerBase.java:1650) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:539) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:351) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:184) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629) ~[shinyproxy-operator.jar:2.0.0]
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker.checkServer(RecyclableChecker.kt:82) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker.isInstanceRecyclable(RecyclableChecker.kt:54) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.checkForObsoleteInstances(ShinyProxyController.kt:293) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.receiveAndHandleEvent$tryReceiveAndHandleEvent(ShinyProxyController.kt:109) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.receiveAndHandleEvent(ShinyProxyController.kt:118) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController$receiveAndHandleEvent$1.invokeSuspend(ShinyProxyController.kt) ~[shinyproxy-operator.jar:2.0.0]
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.DispatchedTaskKt.resume(DispatchedTask.kt:178) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:166) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:397) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.CancellableContinuationImpl.completeResume(CancellableContinuationImpl.kt:513) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.channels.AbstractChannel$ReceiveElement.completeResumeReceive(AbstractChannel.kt:907) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.channels.ArrayChannel.offerInternal(ArrayChannel.kt:83) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.channels.AbstractSendChannel.send(AbstractChannel.kt:134) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.scheduleAdditionalEvents(ShinyProxyController.kt:307) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.access$scheduleAdditionalEvents(ShinyProxyController.kt:43) ~[shinyproxy-operator.jar:2.0.0]
    at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController$scheduleAdditionalEvents$1.invokeSuspend(ShinyProxyController.kt) ~[shinyproxy-operator.jar:2.0.0]
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) ~[shinyproxy-operator.jar:2.0.0]
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665) ~[shinyproxy-operator.jar:2.0.0]
12:25:05.177 [atcher-worker-1] WARN  eu.op.sh.co.ShinyProxyController     - Caught an exception while processing event. [Attempt 4/5]

apiVersion: openanalytics.eu/v1alpha1
kind: ShinyProxy
metadata:
  name: shinyproxy-dev
  namespace: dev
spec:
  spring:
    session:
      store-type: redis
    redis:
      password: <REDACTED>
      sentinel:
        master: shinyproxy
        password: <REDACTED>
        nodes: <REDACTED>,<REDACTED>,<REDACTED>
  management:
    endpoints:
      web:
        exposure:
          include: info,health,beans,prometheus,metrics
    metrics:
      export:
        prometheus:
          enabled: true
  server:
      secureCookies: true
      frameOptions: sameorigin
      forward-headers-strategy: native
      servlet:
        multipart:
          max-file-size: 50MB
          max-request-size: 50MB
  logging:
    file:
      name: shinyproxy.log
    level:
      io.undertow: DEBUG
      eu.openanalytics: DEBUG
      org.springframework: DEBUG
  proxy:
      store-mode: Redis
      stop-proxies-on-shutdown: false
      title: Development
      logoUrl: ""
      landing-page: /
      heartbeat-rate: 10000 #in miliseconds
      heartbeat-timeout: 60000 #in miliseconds
      container-wait-time: 60000 #in miliseconds
      default-proxy-max-lifetime: 1440 #in minutes
      port: 8080
      authentication: openid
      openid:
        auth-url: https://<REDACTED>/oauth2/v2.0/authorize
        token-url: https://<REDACTED>/oauth2/v2.0/token
        jwks-url: https://<REDACTED>/discovery/v2.0/keys
        client-id: <REDACTED>
        client-secret: <REDACTED>
        username-attribute: email
        roles-claim: roles
      usage-stats-url: micrometer
      container-backend: kubernetes
      kubernetes:
        internal-networking: true
        namespace: dev
        pod-wait-time: 600000 #in milliseconds
        image-pull-policy: IfNotPresent
        image-pull-secrets:
        - name: docker
      template-path: ./templates
      template-groups:
      - id: demo
        properties:
          display-name: DEMO
      specs: []
  kubernetesPodTemplateSpecPatches: |
      - op: add
        path: /spec/containers/0/env/-
        value:
          name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: redis
              key: redis-password
      - op: add
        path: /spec/containers/0/env/-
        value:
          name: dev
          valueFrom:
            secretKeyRef:
              name: secret
              key: dev
      - op: replace
        path: /spec/containers/0/livenessProbe
        value:
          failureThreshold: 2
          httpGet:
            path: /actuator/health/liveness
            port: 9090
            scheme: HTTP
          periodSeconds: 1
          initialDelaySeconds: 140
          successThreshold: 1
          timeoutSeconds: 1
      - op: replace
        path: /spec/containers/0/readinessProbe
        value:
          failureThreshold: 2
          httpGet:
            path: /actuator/health/readiness
            port: 9090
            scheme: HTTP
          periodSeconds: 1
          initialDelaySeconds: 140
          successThreshold: 1
          timeoutSeconds: 1
      - op: add
        path: /spec/volumes/-
        value:
          name: shinyproxy-templates-dev
          persistentVolumeClaim:
            claimName: shinyproxy-templates-dev
      - op: add
        path: /spec/containers/0/volumeMounts/-
        value:
            mountPath: "/opt/shinyproxy/templates"
            name: shinyproxy-templates-dev
      - op: add
        path: /spec/containers/0/resources
        value:
          limits:
            cpu: 1
            memory: 1Gi
          requests:
            cpu: 0.5
            memory: 1Gi
      - op: add
        path: /spec/serviceAccountName
        value: default
  kubernetesIngressPatches: |
    - op: add
      path: /metadata/annotations
      value:
        nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/affinity: cookie
        nginx.ingress.kubernetes.io/proxy-read-timeout: "420"
        nginx.ingress.kubernetes.io/proxy-send-timeout: "420"
        nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
        nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
        nginx.ingress.kubernetes.io/proxy-body-size: 5000m
        cert-manager.io/cluster-issuer: sectigo
    - op: add
      path: /spec/ingressClassName
      value: nginx
    - op: add
      path: /spec/tls
      value:
        - hosts:
          - dev.example.com 
          secretName: dev-tls
  image: example.com/openanalytics/shinyproxy:3.0.2
  imagePullPolicy: Always
  image-pull-secrets:
  - name: docker
  replicas: 1
  fqdn: dev.example.com

If there's anything I can try, let me know.

cdenneen commented 6 months ago

@leynebe any luck in fixing this?

leynebe commented 6 months ago

@cdenneen No. I have not had the time to properly imvestigate. I was hoping you guys had an easy solution or some tips. It's getting more urgent though, non of the shiny pods are killed upon upgrade, so it's beginning to get obscene how many pods that are just running and wasting money. I'll have a look when I find the time or priorities change.

cdenneen commented 6 months ago

@leynebe I don't work on this project. Just user like yourself. If you find solution let me know. Otherwise only thing I can think of is a CronJob to determine and kill lingering pods. I'm actually having more of an issue with the proxied apps staying around. I have stop-proxies-on-shutdown: false so I do expect to see them staying around for a bit but I tested and when I clicked same app again it spun up new pod for it rather than the one running for 4hrs

LEDfan commented 5 months ago

Hi @leynebe

I looked into this and found that this is being caused by the following part of your configuration:

  management:
    endpoints:
      web:
        exposure:
          include: info,health,beans,prometheus,metrics

This property tells Spring which actuator (management) endpoints to expose. The default ShinyProxy config includes an endpoint called recyclable (https://github.com/openanalytics/containerproxy/blob/2c71c88a0f8a8f71e2551343e09b659c6f11c1fe/src/main/java/eu/openanalytics/containerproxy/ContainerProxyApplication.java#L342) which is used by the operator to check whether a ShinyProxy instance still has active websocket connections.

Usually there is no need to configure this option, so I would advice to just remove it. If you do need this option, it would be useful to know the reason, such that we can either better cover this use-case or document it on the website.

Once you remove this option, the old instances will not be removed automatically. You can either re-deploy ShinyProxy (by removing the custom resource and re-creating it), or you can manually remove the old instances from the ShinyProxy resource, using:

kubectl edit shinyproxy <crd_name> -n <namespace> --subresource='status'

Next you need to remove the old replicasets and configmaps created by shinyproxy.

leynebe commented 5 months ago

@LEDfan

Usually there is no need to configure this option, so I would advice to just remove it. If you do need this option, it would be useful to know the reason, such that we can either better cover this use-case or document it on the website.

I will try adding this default option back. Thanks for the research and explanation!!

leynebe commented 5 months ago

@LEDfan This worked. Thanks a bunch!

openanalytics / shinyproxy-operator

Old shinyproxy pods not deleted by the operator due to operator exceptions being thrown. #43