temporalio / helm-charts

Temporal Helm charts
MIT License
294 stars 321 forks source link

[Bug] Cannot deploy Temporal with self-hosted Postgres Instance on GCP #528

Closed washeeeq closed 1 month ago

washeeeq commented 1 month ago

What are you really trying to do?

Describe the bug

We are using a custom chart with a temporal as a subchart:

apiVersion: v2
name: temporal
description: A Helm chart with temporal as subchart
type: application
version: 0.1.0
appVersion: "0.1.0"
dependencies:
  - name: temporal
    version: v0.43.0
    repository: https://temporalio.github.io/helm-charts
    alias: temporal

Than adding a values to modify the deployment:

temporal:
  server:
    config:
      persistence:
        default:
          driver: "sql"
          sql:
            driver: "postgres12"
            host: "10.100.100.3"
            port: 5432
            database: temporal
            user: temporal_app
            existingSecret: temporal-default-store
            maxConns: 20
            maxConnLifetime: "1h"
            tls:
              enabled: false
        visibility:
          driver: "sql"
          sql:
            driver: "postgres12"
            host: "10.100.100.3"
            port: 5432
            database: temporal_visibility
            user: temporal_visibility_app
            existingSecret: temporal-visibility-store
            maxConns: 20
            maxConnLifetime: "1h"
            tls:
              enabled: false
    dynamicConfig:
      frontend.globalNamespaceRPS: # Total per-Namespace RPC rate limit applied across the Cluster.
        - value: 5000
    frontend:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
      podAnnotations:
        helm.sh/hook: pre-install,pre-upgrade
        helm.sh/hook-weight: "-1"
      command: ["/bin/sh"]
      args: ["-c", "while true; do sleep 600; done"]
      # additionalVolumes: 
      #   - name: dynamic-config
      #     configMap:
      #       name: "temporal-dynamic-config"
      #       items:
      #       - key: dynamic_config.yaml
      #         path: dynamic_config.yaml
      additionalVolumeMounts:
        - mountPath: /etc/temporal/dynamic_config/dynamic_config.yaml
          name: dynamic-config
          subPath: dynamic_config.yaml
    history:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
    matching:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
    worker:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
  cassandra:
    enabled: false
  prometheus:
    enabled: false
  grafana:
    enabled: false
  # we are not deploying Postgres
  postgresql:
    enabled: false
  elasticsearch:
    enabled: true
    replicas: 2
    minimumMasterNodes: 1
    host: "elasticsearch-master"
    # this really causes an issue
    # external: true
    resources:
      requests:
        cpu: "500m"
        memory: "2Gi"
      limits:
        cpu: "500m"
        memory: "4Gi"
  schema:
    createDatabase:
      enabled: false
    setup:
      enabled: true
    update:
      enabled: false

After starting jobs to create schemes both for elasticsearch and postgres starts up and end successfully (logs from postgres Job):

2024-07-06T17:46:56.451Z    INFO   Starting schema setup   {"config": {"SchemaFilePath":"","SchemaName":"","InitialVersion":"0.0","Overwrite":false,"DisableVersioning":false}, "logging-call-at": "setuptask.go:63"}
2024-07-06T17:46:56.451Z    DEBUG  Setting up version tables   {"logging-call-at": "setuptask.go:73"}
2024-07-06T17:46:56.468Z    DEBUG  Setting initial schema version to 0.0   {"logging-call-at": "setuptask.go:136"}
2024-07-06T17:46:56.469Z    DEBUG  Updating schema update log  {"logging-call-at": "setuptask.go:141"}
2024-07-06T17:46:56.472Z    INFO   Schema setup complete   {"logging-call-at": "setuptask.go:149"}

But than frontend, history, and 2 more pods fails to start due to:

2024/07/06 17:09:31 Loading config; env=docker,zone=,configDir=config
2024/07/06 17:09:31 Loading config files=[config/docker.yaml]
{"level":"info","ts":"2024-07-06T17:09:31.703Z","msg":"Build info.","git-time":"2024-06-21T18:27:06.000Z","git-revision":"645e72d2b917d1c1eb1d11ca7eac6332b3483e29","git-modified":false,"go-arch":"amd64","go-os":"linux","go-version":"go1.21.11","cgo-enabled":false,"server-version":"1.24.2","debug-mode":false,"logging-call-at":"main.go:148"}
{"level":"info","ts":"2024-07-06T17:09:31.703Z","msg":"dynamic config changed for the key: frontend.globalnamespacerps oldValue: nil newValue: { constraints: {} value: 5000 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-07-06T17:09:31.703Z","msg":"Updated dynamic config","logging-call-at":"file_based_client.go:195"}
{"level":"warn","ts":"2024-07-06T17:09:31.703Z","msg":"Not using any authorizer and flag `--allow-no-auth` not detected. Future versions will require using the flag `--allow-no-auth` if you do not want to set an authorizer.","logging-call-at":"main.go:178"}
[Fx] PROVIDE    fx.Lifecycle <= go.uber.org/fx.New.func1()
[Fx] PROVIDE    fx.Shutdowner <= go.uber.org/fx.(*App).shutdowner-fm()
[Fx] PROVIDE    fx.DotGraph <= go.uber.org/fx.(*App).dotGraph-fm()
[Fx] PROVIDE    *temporal.ServerImpl <= go.temporal.io/server/temporal.NewServerFxImpl()
[Fx] PROVIDE    *temporal.serverOptions <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    chan interface {} <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    temporal.synchronizationModeParams <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    *config.Config <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    *config.PProf <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    log.Config <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    resource.ServiceNames <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    resource.NamespaceLogger <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    resolver.ServiceResolver <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    client.AbstractDataStoreFactory <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    visibility.VisibilityStoreFactory <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    searchattribute.Mapper <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    []grpc.UnaryServerInterceptor <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    authorization.Authorizer <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    authorization.ClaimMapper <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    authorization.JWTAudienceMapper <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    map[primitives.ServiceName]static.Hosts <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    log.Logger <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    client.FactoryProvider <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    dynamicconfig.Client <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    encryption.TLSConfigProvider <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    *client.Config <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    client.Client <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    metrics.Handler <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE    *dynamicconfig.Collection <= go.temporal.io/server/common/dynamicconfig.NewCollection()
[Fx] PROVIDE    archiver.ArchivalMetadata <= go.temporal.io/server/common/resource.ArchivalMetadataProvider()
[Fx] PROVIDE    tasks.TaskCategoryRegistry <= go.temporal.io/server/temporal.TaskCategoryRegistryProvider()
[Fx] PROVIDE    client.FactoryProviderFn <= go.temporal.io/server/temporal.PersistenceFactoryProvider()
[Fx] PROVIDE    *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.HistoryServiceProvider()
[Fx] PROVIDE    *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.MatchingServiceProvider()
[Fx] PROVIDE    *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.FrontendServiceProvider()
[Fx] PROVIDE    *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.InternalFrontendServiceProvider()
[Fx] PROVIDE    *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.WorkerServiceProvider()
[Fx] PROVIDE    *cluster.Config <= go.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider()
[Fx] PROVIDE    config.Persistence <= go.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider()
[Fx] PROVIDE    *pprof.PProfInitializerImpl <= go.temporal.io/server/common/pprof.NewInitializer()
[Fx] PROVIDE    []trace.SpanExporter <= go.temporal.io/server/temporal.glob..func2()
[Fx] SUPPLY []temporal.ServerOption
[Fx] RUN    supply: stub([]temporal.ServerOption)
[Fx] RUN    provide: go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] Error returned: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider
    /home/runner/work/docker-builds/docker-builds/temporal/temporal/fx.go:183:
sql schema version compatibility check failed: version mismatch for keyspace/database: "temporal". Expected version: 1.12 cannot be greater than Actual version: 0.0
[Fx] ERROR      Failed to initialize custom logger: could not build arguments for function "go.uber.org/fx".(*module).constructCustomLogger.func2
    /home/runner/go/pkg/mod/go.uber.org/fx@v1.21.1/module.go:292:
failed to build fxevent.Logger:
could not build arguments for function "go.temporal.io/server/temporal".glob..func8
    /home/runner/work/docker-builds/docker-builds/temporal/temporal/fx.go:1009:
failed to build log.Logger:
received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider
    /home/runner/work/docker-builds/docker-builds/temporal/temporal/fx.go:183:
sql schema version compatibility check failed: version mismatch for keyspace/database: "temporal". Expected version: 1.12 cannot be greater than Actual version: 0.0
Unable to create server. Error: could not build arguments for function "go.uber.org/fx".(*module).constructCustomLogger.func2 (/home/runner/go/pkg/mod/go.uber.org/fx@v1.21.1/module.go:292): failed to build fxevent.Logger: could not build arguments for function "go.temporal.io/server/temporal".glob..func8 (/home/runner/work/docker-builds/docker-builds/temporal/temporal/fx.go:1009): failed to build log.Logger: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider (/home/runner/work/docker-builds/docker-builds/temporal/temporal/fx.go:183): sql schema version compatibility check failed: version mismatch for keyspace/database: "temporal". Expected version: 1.12 cannot be greater than Actual version: 0.0.

Minimal Reproduction

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: temporal
spec:
  destination:
    name: ''
    namespace: temporal
    server: 'https://kubernetes.default.svc'
  source:
    path: cloud/helm/infra/temporal
    repoURL: 'https://github.com/your_repo'
    targetRevision: feature/SRE-133--add_temporal
    helm:
      valueFiles:
        - overlays/dev/values.yaml
  sources: []
  project: google-cloud-dev
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Environment/Versions

OS and processor: AMD64, Linux Temporal Version: latest Are you using Docker or Kubernetes or building Temporal from source: Kubernetes

robholland commented 1 month ago

If you have schema.update.enabled=false in values, you need to run the initial schema update yourself manually. schema.setup just creates versioning information tables, it does not run the migrations to add the required tables etc. To bootstrap a system and then ensure that no updates are done automatically, deploy with schema.update enabled but then use schema.update disabled for future upgrades.