tryretool / retool-helm

MIT License
45 stars 57 forks source link

Workflow worker failing to connect to Temporal #111

Open christopher-wong opened 1 year ago

christopher-wong commented 1 year ago

The workflow worker appears to be failing to connect to temporal and my k8s cluster kills the pod due to Liveness/Readiness failing. Within Temporal, I do see the workflows scheduled however.

{"level":"error","message":"Ran into error when scheduling UpdateTimedOutWorkflows Error: Failed to connect before the deadline","timestamp":"2023-07-17T20:17:08.541Z"}
/snapshot/retool_development/node_modules/@temporalio/client/node_modules/@grpc/grpc-js/build/src/client.js:77
 callback(new Error('Failed to connect before the deadline'));
 ^

Error: Failed to connect before the deadline
 at checkState (/snapshot/retool_development/node_modules/@temporalio/client/node_modules/@grpc/grpc-js/build/src/client.js:77:26)
 at Timeout._onTimeout (/snapshot/retool_development/node_modules/@temporalio/client/node_modules/@grpc/grpc-js/build/src/channel.js:525:17)
 at listOnTimeout (node:internal/timers:559:17)
 at processTimers (node:internal/timers:502:7)

In the logs I also see

{"level":"info","message":"Skipping UpdateTimedOutWorkflows: Already exists","timestamp":"2023-07-17T20:25:11.086Z"}

Indicating a successful connection to Temporal, however the readiness/liveness probes never become healthy.

values.yaml

config:
  licenseKey: ...
  encryptionKey: ...
  jwtSecret: ...

  postgresql:
    # Specify if postgresql subchart is disabled
    host: ...
    port: 5432
    db: postgres
    user: ...
    password: ...

ingress:
  enabled: false

postgresql:
  enabled: false

image:
  tag: "3.2.2"

replicaCount: 2

workflows:
  enabled: true
  replicaCount: 2
  temporal:
    enabled: true
    host: "temporal-frontend.temporal.svc.cluster.local"
    port: 7233
    namespace: retool

retool-temporal-services-helm:
  enabled: false
Screenshot 2023-07-17 at 4 23 05 PM