openanalytics / shinyproxy-operator

Easily run ShinyProxy on a Kubernetes cluster
https://shinyproxy.io
Apache License 2.0
36 stars 9 forks source link

Old shinyproxy pods not deleted by the operator due to operator exceptions being thrown. #43

Closed leynebe closed 5 months ago

leynebe commented 6 months ago

I scoured the shinyproxy operator config docs and after not finding an option to terminate old shinyproxy pods I discovered the shinyproxy operator itself was throwing errors. 2 errors stand out to me from the following shinyproxy operator logs:

apiVersion: openanalytics.eu/v1alpha1
kind: ShinyProxy
metadata:
  name: shinyproxy-dev
  namespace: dev
spec:
  spring:
    session:
      store-type: redis
    redis:
      password: <REDACTED>
      sentinel:
        master: shinyproxy
        password: <REDACTED>
        nodes: <REDACTED>,<REDACTED>,<REDACTED>
  management:
    endpoints:
      web:
        exposure:
          include: info,health,beans,prometheus,metrics
    metrics:
      export:
        prometheus:
          enabled: true
  server:
      secureCookies: true
      frameOptions: sameorigin
      forward-headers-strategy: native
      servlet:
        multipart:
          max-file-size: 50MB
          max-request-size: 50MB
  logging:
    file:
      name: shinyproxy.log
    level:
      io.undertow: DEBUG
      eu.openanalytics: DEBUG
      org.springframework: DEBUG
  proxy:
      store-mode: Redis
      stop-proxies-on-shutdown: false
      title: Development
      logoUrl: ""
      landing-page: /
      heartbeat-rate: 10000 #in miliseconds
      heartbeat-timeout: 60000 #in miliseconds
      container-wait-time: 60000 #in miliseconds
      default-proxy-max-lifetime: 1440 #in minutes
      port: 8080
      authentication: openid
      openid:
        auth-url: https://<REDACTED>/oauth2/v2.0/authorize
        token-url: https://<REDACTED>/oauth2/v2.0/token
        jwks-url: https://<REDACTED>/discovery/v2.0/keys
        client-id: <REDACTED>
        client-secret: <REDACTED>
        username-attribute: email
        roles-claim: roles
      usage-stats-url: micrometer
      container-backend: kubernetes
      kubernetes:
        internal-networking: true
        namespace: dev
        pod-wait-time: 600000 #in milliseconds
        image-pull-policy: IfNotPresent
        image-pull-secrets:
        - name: docker
      template-path: ./templates
      template-groups:
      - id: demo
        properties:
          display-name: DEMO
      specs: []
  kubernetesPodTemplateSpecPatches: |
      - op: add
        path: /spec/containers/0/env/-
        value:
          name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: redis
              key: redis-password
      - op: add
        path: /spec/containers/0/env/-
        value:
          name: dev
          valueFrom:
            secretKeyRef:
              name: secret
              key: dev
      - op: replace
        path: /spec/containers/0/livenessProbe
        value:
          failureThreshold: 2
          httpGet:
            path: /actuator/health/liveness
            port: 9090
            scheme: HTTP
          periodSeconds: 1
          initialDelaySeconds: 140
          successThreshold: 1
          timeoutSeconds: 1
      - op: replace
        path: /spec/containers/0/readinessProbe
        value:
          failureThreshold: 2
          httpGet:
            path: /actuator/health/readiness
            port: 9090
            scheme: HTTP
          periodSeconds: 1
          initialDelaySeconds: 140
          successThreshold: 1
          timeoutSeconds: 1
      - op: add
        path: /spec/volumes/-
        value:
          name: shinyproxy-templates-dev
          persistentVolumeClaim:
            claimName: shinyproxy-templates-dev
      - op: add
        path: /spec/containers/0/volumeMounts/-
        value:
            mountPath: "/opt/shinyproxy/templates"
            name: shinyproxy-templates-dev
      - op: add
        path: /spec/containers/0/resources
        value:
          limits:
            cpu: 1
            memory: 1Gi
          requests:
            cpu: 0.5
            memory: 1Gi
      - op: add
        path: /spec/serviceAccountName
        value: default
  kubernetesIngressPatches: |
    - op: add
      path: /metadata/annotations
      value:
        nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/affinity: cookie
        nginx.ingress.kubernetes.io/proxy-read-timeout: "420"
        nginx.ingress.kubernetes.io/proxy-send-timeout: "420"
        nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
        nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
        nginx.ingress.kubernetes.io/proxy-body-size: 5000m
        cert-manager.io/cluster-issuer: sectigo
    - op: add
      path: /spec/ingressClassName
      value: nginx
    - op: add
      path: /spec/tls
      value:
        - hosts:
          - dev.example.com 
          secretName: dev-tls
  image: example.com/openanalytics/shinyproxy:3.0.2
  imagePullPolicy: Always
  image-pull-secrets:
  - name: docker
  replicas: 1
  fqdn: dev.example.com

If there's anything I can try, let me know.

cdenneen commented 6 months ago

@leynebe any luck in fixing this?

leynebe commented 6 months ago

@cdenneen No. I have not had the time to properly imvestigate. I was hoping you guys had an easy solution or some tips. It's getting more urgent though, non of the shiny pods are killed upon upgrade, so it's beginning to get obscene how many pods that are just running and wasting money. I'll have a look when I find the time or priorities change.

cdenneen commented 6 months ago

@leynebe I don't work on this project. Just user like yourself. If you find solution let me know. Otherwise only thing I can think of is a CronJob to determine and kill lingering pods. I'm actually having more of an issue with the proxied apps staying around. I have stop-proxies-on-shutdown: false so I do expect to see them staying around for a bit but I tested and when I clicked same app again it spun up new pod for it rather than the one running for 4hrs

LEDfan commented 5 months ago

Hi @leynebe

I looked into this and found that this is being caused by the following part of your configuration:

  management:
    endpoints:
      web:
        exposure:
          include: info,health,beans,prometheus,metrics

This property tells Spring which actuator (management) endpoints to expose. The default ShinyProxy config includes an endpoint called recyclable (https://github.com/openanalytics/containerproxy/blob/2c71c88a0f8a8f71e2551343e09b659c6f11c1fe/src/main/java/eu/openanalytics/containerproxy/ContainerProxyApplication.java#L342) which is used by the operator to check whether a ShinyProxy instance still has active websocket connections.

Usually there is no need to configure this option, so I would advice to just remove it. If you do need this option, it would be useful to know the reason, such that we can either better cover this use-case or document it on the website.

Once you remove this option, the old instances will not be removed automatically. You can either re-deploy ShinyProxy (by removing the custom resource and re-creating it), or you can manually remove the old instances from the ShinyProxy resource, using:

kubectl edit shinyproxy <crd_name> -n <namespace> --subresource='status'

Next you need to remove the old replicasets and configmaps created by shinyproxy.

leynebe commented 5 months ago

@LEDfan

Usually there is no need to configure this option, so I would advice to just remove it. If you do need this option, it would be useful to know the reason, such that we can either better cover this use-case or document it on the website.

I will try adding this default option back. Thanks for the research and explanation!!

leynebe commented 5 months ago

@LEDfan This worked. Thanks a bunch!