temporalio / helm-charts

Temporal Helm charts
MIT License
286 stars 317 forks source link

Temporal services are not starting when deployed in istio enabled namespace #338

Open mmlk09 opened 1 year ago

mmlk09 commented 1 year ago

What are you really trying to do?

Trying to deploy temporal in istio enabled namespace along with PostgreSQL DB on a Kubernetes cluster. Deployment is done using helm charts available at - https://github.com/temporalio/helm-charts

Describe the bug

Post deployment history, matching and worker PODs are continuously restarting with CrashLoopBackOff with below mentioned logs.

Here is status of PODs: NAME READY STATUS RESTARTS AGE temporaldb-7d7f9c67b8-qqxbb 2/2 Running 0 22m temporaltest-admintools-7bf55d4b8f-t47m2 2/2 Running 0 16m temporaltest-frontend-546c569b9b-g4gv4 2/2 Running 0 16m temporaltest-history-8648bd7db9-xsnz4 1/2 CrashLoopBackOff 6 (74s ago) 16m temporaltest-matching-bc9f69487-9vc95 1/2 CrashLoopBackOff 6 (52s ago) 16m temporaltest-web-67d46c8748-zf4xn 2/2 Running 0 16m temporaltest-worker-5c58f8cc58-c6jwd 1/2 Error 6 (4m32s ago) 16m

LOGs: {"level":"info","ts":"2022-12-07T17:48:04.721Z","msg":"bootstrap hosts fetched","service":"worker","bootstrap-hostports":"10.4.0.125:6939,10.4.0.124:6933,10.4.0.126:6935,10.4.0.127:6934","logging-call-at":"rpMonitor.go:285"} {"level":"error","ts":"2022-12-07T17:48:46.557Z","msg":"unable to bootstrap ringpop. retrying","service":"worker","error":"join duration of 41.835509067s exceeded max 30s","logging-call-at":"ringpop.go:109","stacktrace":"go.temporal.io/server/common/log.(zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/common/membership.(RingPop).bootstrap.func1\n\t/home/builder/temporal/common/membership/ringpop.go:109\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:170\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:171\ngo.temporal.io/server/common/membership.(RingPop).bootstrap\n\t/home/builder/temporal/common/membership/ringpop.go:114\ngo.temporal.io/server/common/membership.(RingPop).Start\n\t/home/builder/temporal/common/membership/ringpop.go:84\ngo.temporal.io/server/common/membership.(ringpopMonitor).Start\n\t/home/builder/temporal/common/membership/rpMonitor.go:135\ngo.temporal.io/server/common/resource.MembershipMonitorProvider.func1\n\t/home/builder/temporal/common/resource/fx.go:268\ngo.uber.org/fx/internal/lifecycle.(Lifecycle).runStartHook\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:120\ngo.uber.org/fx/internal/lifecycle.(Lifecycle).Start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:85\ngo.uber.org/fx.(App).start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:683\ngo.uber.org/fx.withTimeout.func1\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:773"} {"level":"fatal","ts":"2022-12-07T17:48:46.557Z","msg":"unable to bootstrap ringpop. exhausted all retries","service":"worker","error":"join duration of 41.835509067s exceeded max 30s","logging-call-at":"ringpop.go:116","stacktrace":"go.temporal.io/server/common/log.(zapLogger).Fatal\n\t/home/builder/temporal/common/log/zap_logger.go:151\ngo.temporal.io/server/common/membership.(RingPop).bootstrap\n\t/home/builder/temporal/common/membership/ringpop.go:116\ngo.temporal.io/server/common/membership.(RingPop).Start\n\t/home/builder/temporal/common/membership/ringpop.go:84\ngo.temporal.io/server/common/membership.(ringpopMonitor).Start\n\t/home/builder/temporal/common/membership/rpMonitor.go:135\ngo.temporal.io/server/common/resource.MembershipMonitorProvider.func1\n\t/home/builder/temporal/common/resource/fx.go:268\ngo.uber.org/fx/internal/lifecycle.(Lifecycle).runStartHook\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:120\ngo.uber.org/fx/internal/lifecycle.(Lifecycle).Start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:85\ngo.uber.org/fx.(*App).start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:683\ngo.uber.org/fx.withTimeout.func1\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:773"}

Environment/Versions

Additional context

Temporal deployment is done on KiND based Kubernetes cluster KiND version: v0.17.0 K8s version: 1.25.3 Istio version: 1.16.0

mmlk09 commented 1 year ago

Solution described here fixed the issue: https://community.temporal.io/t/temporal-workload-unable-to-talk-to-each-other-when-strct-mtls-enabled-in-istio/6650/5

Can we please include this fix in temporal helm charts?

deepakjeena commented 1 year ago

I am facing similar issue:

kubectl get pods -n temporal

NAME READY STATUS RESTARTS AGE temporal-admintools-67ddc7b669-k2dfs 2/2 Running 0 72m temporal-frontend-697b57c864-9xlsx 1/2 CrashLoopBackOff 19 (22s ago) 72m temporal-history-54f48dc5ff-nzqp8 1/2 CrashLoopBackOff 18 (5m8s ago) 72m temporal-matching-55686b777c-m2ppd 1/2 CrashLoopBackOff 19 (40s ago) 72m temporal-web-645b8fb6b4-bxl7m 2/2 Running 0 72m temporal-worker-7bbb976c86-cf7jf 1/2 CrashLoopBackOff 19 (38s ago) 72m

kubectl logs temporal-frontend-697b57c864-9xlsx -n temporal 2023/01/29 16:01:44 Loading config; env=docker,zone=,configDir=config 2023/01/29 16:01:44 Loading config files=[config/docker.yaml] Unable to load configuration: config file corrupted: yaml: unmarshal errors: line 19: cannot unmarshal !!str transac... into map[string]string line 32: cannot unmarshal !!str transac... into map[string]string.

robholland commented 3 weeks ago

@deepakjeena Your issue is not at all related to the one originally reported here.