[Bug] Unable to deploy Temporal using ArgoCD

washeeeq commented 4 days ago

What are you really trying to do?

Describe the bug

Using GCP Postgres as default store and as visibility store. We are using a chart as a subchart:

apiVersion: v2
name: temporal
description: A Helm chart with temporal as subchart
type: application
version: 0.1.0
appVersion: "0.1.0"
dependencies:
  - name: temporal
    version: v0.43.0
    repository: https://temporalio.github.io/helm-charts
    alias: temporal

Than adding a values to modify the deployment:

temporal:
  server:
    config:
      persistence:
        default:
          driver: "sql"
          sql:
            driver: "postgres12"
            host: "10.100.100.3"
            port: 5432
            database: temporal
            user: temporal_app
            existingSecret: temporal-default-store
            maxConns: 20
            maxConnLifetime: "1h"
            tls:
              enabled: false
        visibility:
          driver: "sql"
          sql:
            driver: "postgres12"
            host: "10.100.100.3"
            port: 5432
            database: temporal_visibility
            user: temporal_visibility_app
            existingSecret: temporal-visibility-store
            maxConns: 20
            maxConnLifetime: "1h"
            tls:
              enabled: false
    dynamicConfig:
      frontend.globalNamespaceRPS: # Total per-Namespace RPC rate limit applied across the Cluster.
        - value: 5000
    frontend:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
      podAnnotations:
        helm.sh/hook: pre-install,pre-upgrade
        helm.sh/hook-weight: "-1"
      command: ["/bin/sh"]
      args: ["-c", "while true; do sleep 600; done"]
      # additionalVolumes: 
      #   - name: dynamic-config
      #     configMap:
      #       name: "temporal-dynamic-config"
      #       items:
      #       - key: dynamic_config.yaml
      #         path: dynamic_config.yaml
      additionalVolumeMounts:
        - mountPath: /etc/temporal/dynamic_config/dynamic_config.yaml
          name: dynamic-config
          subPath: dynamic_config.yaml
    history:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
    matching:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
    worker:
      additionalEnv:
        - name: TEMPORAL_ADDRESS
          value: "temporal.super-genius.org"
  cassandra:
    enabled: false
  prometheus:
    enabled: false
  grafana:
    enabled: false
  # we are not deploying Postgres
  postgresql:
    enabled: false
  elasticsearch:
    enabled: true
    replicas: 2
    minimumMasterNodes: 1
    host: "elasticsearch-master"
    # this really causes an issue
    # external: true
    resources:
      requests:
        cpu: "500m"
        memory: "2Gi"
      limits:
        cpu: "500m"
        memory: "4Gi"
  schema:
    createDatabase:
      enabled: false
    setup:
      enabled: true
    update:
      enabled: false

Now with such a config, temporal hangs on bringing up for pods: frontend, history, worker and one more.

After further investigation I found out that a batch job to create index (es-index-setup) is not starting. Probably wrong weight used: "helm.sh/hook-weight": "0"

If I add external: true the script is triggered but this hinders than initial deployment.

After the elastic is initilazed, frontend pod is starting but very quickly ends with:

2024/07/01 09:25:14 Loading config; env=docker,zone=,configDir=config
2024/07/01 09:25:14 Loading config files=[config/docker.yaml]
Unable to load configuration: config file corrupted: yaml: line 30: found unknown escape character.

Minimal Reproduction

Install ArgoCD on the Kubernetes cluster
create chart with a subchart of temporal

create application file for ArgoCD which triggers the deployment

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: temporal
spec:
destination:
name: ''
namespace: temporal
server: 'https://kubernetes.default.svc'
source:
path: cloud/helm/infra/temporal
repoURL: 'https://github.com/your_repo'
targetRevision: feature/SRE-133--add_temporal
helm:
  valueFiles:
    - overlays/dev/values.yaml
sources: []
project: google-cloud-dev
syncPolicy:
automated:
  prune: true
  selfHeal: true

OS and processor: AMD64, Linux
Temporal Version: latest
Are you using Docker or Kubernetes or building Temporal from source: Kubernetes

robholland commented 3 days ago

The hooks are now removed on main (replaced with a single job), that should improve things for you once that is released. The config issue will be unrelated, please can you check the configmap and see what line 30 of the config we generated there looks like?

washeeeq commented 1 day ago

Yes, but that is a bit unclear to me, I tried to use busybox image to connect to the container but I see no config/docker.yaml there so probably you take config_template.yaml and somehow transform it to docker.yaml. So here is a configmap from the cluster:

apiVersion: v1
data:
  config_template.yaml: |-
    log:
      stdout: true
      level: "debug,info"

    persistence:
      defaultStore: default
      visibilityStore: es-visibility
      numHistoryShards: 512
      datastores:
        default:
          sql:
            pluginName: "postgres12"
            driverName: "postgres12"
            databaseName: "temporal"
            connectAddr: "10.100.100.3:5432"
            connectProtocol: "tcp"
            user: temporal_app
            password: "{{ .Env.TEMPORAL_STORE_PASSWORD }}"
            maxConnLifetime: 1h
            maxConns: 20
            secretName: ""
        visibility:
          sql:
            pluginName: "postgres12"
            driverName: "postgres12"
            databaseName: "temporal_visibility"
            connectAddr: "10.100.100.3:5432"
            connectProtocol: "tcp"
            user: "temporal_visibility_app"
            password: "{{ .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }}"
            maxConnLifetime: 1h
            maxConns: 20
            secretName: ""
        es-visibility:
            elasticsearch:
                version: "v7"
                url:
                    scheme: "http"
                    host: "elasticsearch-master:9200"
                username: ""
                password: ""
                logLevel: "error"
                indices:
                    visibility: "temporal_visibility_v1_dev"

    global:
      membership:
        name: temporal
        maxJoinDuration: 30s
        broadcastAddress: {{ default .Env.POD_IP "0.0.0.0" }}

      pprof:
        port: 7936

      metrics:
        tags:
          type: frontend
        prometheus:
          timerType: histogram
          listenAddress: "0.0.0.0:9090"

    services:
      frontend:
        rpc:
          grpcPort: 7233
          membershipPort: 6933
          bindOnIP: "0.0.0.0"

      history:
        rpc:
          grpcPort: 7234
          membershipPort: 6934
          bindOnIP: "0.0.0.0"

      matching:
        rpc:
          grpcPort: 7235
          membershipPort: 6935
          bindOnIP: "0.0.0.0"

      worker:
        rpc:
          grpcPort: 7239
          membershipPort: 6939
          bindOnIP: "0.0.0.0"
    clusterMetadata:
      enableGlobalDomain: false
      failoverVersionIncrement: 10
      masterClusterName: "active"
      currentClusterName: "active"
      clusterInformation:
        active:
          enabled: true
          initialFailoverVersion: 1
          rpcName: "temporal-frontend"
          rpcAddress: "127.0.0.1:7233"
    dcRedirectionPolicy:
      policy: "noop"
      toDC: ""
    archival:
      status: "disabled"

    publicClient:
      hostPort: "temporal-frontend:7233"

    dynamicConfigClient:
      filepath: "/etc/temporal/dynamic_config/dynamic_config.yaml"
      pollInterval: "10s"
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"v1","data":{"config_template.yaml":"log:\n  stdout: true\n 
      level: \"debug,info\"\n\npersistence:\n  defaultStore: default\n 
      visibilityStore: es-visibility\n  numHistoryShards: 512\n 
      datastores:\n    default:\n      sql:\n        pluginName:
      \"postgres12\"\n        driverName: \"postgres12\"\n        databaseName:
      \"temporal\"\n        connectAddr: \"10.100.100.3:5432\"\n       
      connectProtocol: \"tcp\"\n        user: temporal_app\n        password:
      \"{{ .Env.TEMPORAL_STORE_PASSWORD }}\"\n        maxConnLifetime:
      1h\n        maxConns: 20\n        secretName: \"\"\n       
      tls:\n          enabled: false\n    visibility:\n      sql:\n       
      pluginName: \"postgres12\"\n        driverName: \"postgres12\"\n       
      databaseName: \"temporal_visibility\"\n        connectAddr:
      \"10.100.100.3:5432\"\n        connectProtocol: \"tcp\"\n        user:
      \"temporal_visibility_app\"\n        password: \"{{
      .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }}\"\n        maxConnLifetime:
      1h\n        maxConns: 20\n        secretName: \"\"\n       
      tls:\n          enabled: false\n    es-visibility:\n       
      elasticsearch:\n            version: \"v7\"\n           
      url:\n                scheme: \"http\"\n                host:
      \"elasticsearch-master:9200\"\n            username: \"\"\n           
      password: \"\"\n            logLevel: \"error\"\n           
      indices:\n                visibility:
      \"temporal_visibility_v1_dev\"\n\nglobal:\n  membership:\n    name:
      temporal\n    maxJoinDuration: 30s\n    broadcastAddress: {{ default
      .Env.POD_IP \"0.0.0.0\" }}\n\n  pprof:\n    port: 7936\n\n  metrics:\n   
      tags:\n      type: frontend\n    prometheus:\n      timerType:
      histogram\n      listenAddress: \"0.0.0.0:9090\"\n\nservices:\n 
      frontend:\n    rpc:\n      grpcPort: 7233\n      membershipPort:
      6933\n      bindOnIP: \"0.0.0.0\"\n\n  history:\n    rpc:\n      grpcPort:
      7234\n      membershipPort: 6934\n      bindOnIP: \"0.0.0.0\"\n\n 
      matching:\n    rpc:\n      grpcPort: 7235\n      membershipPort:
      6935\n      bindOnIP: \"0.0.0.0\"\n\n  worker:\n    rpc:\n      grpcPort:
      7239\n      membershipPort: 6939\n      bindOnIP:
      \"0.0.0.0\"\nclusterMetadata:\n  enableGlobalDomain: false\n 
      failoverVersionIncrement: 10\n  masterClusterName: \"active\"\n 
      currentClusterName: \"active\"\n  clusterInformation:\n    active:\n     
      enabled: true\n      initialFailoverVersion: 1\n      rpcName:
      \"temporal-frontend\"\n      rpcAddress:
      \"127.0.0.1:7233\"\ndcRedirectionPolicy:\n  policy: \"noop\"\n  toDC:
      \"\"\narchival:\n  status: \"disabled\"\n\npublicClient:\n  hostPort:
      \"temporal-frontend:7233\"\n\ndynamicConfigClient:\n  filepath:
      \"/etc/temporal/dynamic_config/dynamic_config.yaml\"\n  pollInterval:
      \"10s\""},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"temporal","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"temporal","app.kubernetes.io/part-of":"temporal","app.kubernetes.io/version":"1.24.2","argocd.argoproj.io/instance":"temporal","helm.sh/chart":"temporal-0.43.0"},"name":"temporal-frontend-config","namespace":"temporal"}}
  creationTimestamp: '2024-07-06T16:45:27Z'
  labels:
    app.kubernetes.io/instance: temporal
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: temporal
    app.kubernetes.io/part-of: temporal
    app.kubernetes.io/version: 1.24.2
    argocd.argoproj.io/instance: temporal
    helm.sh/chart: temporal-0.43.0
  name: temporal-frontend-config
  namespace: temporal
  resourceVersion: '68519726'
  uid: ba3537f0-b0fe-4492-8588-3498b2a8a0f8

I think the problem is with this line: password: "{{ .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }}"

claude gave a hint to do it this way: password: "{{ {{ .Env.TEMPORAL_VISIBILITY_STORE_PASSWORD }} }}"

It seems our password contains special characters

robholland commented 23 hours ago

Ok, this is a known issue, this is fixed in the latest helm chart release.

robholland commented 23 hours ago

Closing now, but please open a new issue if you are still having configmap issues with latest release.

temporalio / helm-charts