[Bug] ElasticSearch index not created when using existing SQL/Cassandra

RonaldGalea commented 7 months ago

What are you really trying to do?

Deploy Temporal with an existing Database.

Describe the bug

When an exiting DB is used, the suggested configuration is disabling the schema setup and update:

schema:
  setup:
    enabled: false
  update:
    enabled: false

However, this causes the the index creation job not to run either: https://github.com/temporalio/helm-charts/blob/master/templates/server-job.yaml#L347

There should likely be a separate flag controlling the ElasticSearch index creation.

Minimal Reproduction

Just run any of the "Install with your own MySQL/PostgreSQL/Cassandra" examples. All server services will be stuck in Init "waiting for elasticsearch index to become ready"

Additional context

There is this post on the community forum which might be related.

DanielCalvo commented 5 months ago

I stumbled upon this today while trying to configure temporal with an external PostgreSQL DB. I also found a workaround, but it isn't pretty.

We're deploying this through an ArgoCD application, and this is what it looks like:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
    name: temporal
    namespace: argocd
spec:
    syncPolicy:
        automated:
            selfHeal: true
            prune: true
    project: default
    destination:
        server: https://kubernetes.default.svc
        namespace: temporal
    source:
        path: charts/temporal
        repoURL: https://github.com/temporalio/helm-charts
        targetRevision: temporal-0.33.0
        helm:
            releaseName: temporal
            values: |-
                replicaCount: 1
                postgresql:
                  enabled: true
                prometheus:
                  enabled: true
                elasticsearch:
                  enabled: true
                grafana:
                  enabled: true
                cassandra:
                  enabled: false
                schema:
                  setup:
                    enabled: true
                  update:
                    enabled: true
                server:
                  config:
                    persistence:
                      default:
                        driver: sql
                        sql:
                          driver: postgres
                          host: temporal(...).eu-west-1.rds.amazonaws.com
                          port: 5432 #if you don't specify this, temporal defaults to port 3306 for postgresql, which is the default port for mysql!
                          user: postgresql
                          password: xxxx
                      visibility:
                        driver: sql
                        sql:
                          driver: postgres
                          host: temporal(...).eu-west-1.rds.amazonaws.com
                          port: 5432
                          user: postgresql
                          password: xxxx

All 4 temporal pods were stuck initializing. I only checked the worker pod which was was failing with:

waiting for elasticsearch index to become ready

This is due the es-index-setup job not being created.

I ended up cloning the repo, checking out the tag I was using above, putting the helm values on a file by themselves and templating the chart locally:

helm template temporal /home/daniel/repos/temporal-helm-charts/charts/temporal/ --values temporal-values.yml

And strangely enough this generated the yaml for the es-index-setup Job, which I then kubectl applied from my machine, which initialized temporal's elastic search instance, and now the pods are OK.

I ran out of time to troubleshoot why the helm chart has this strange behaviour, if it wasn't for this issue I would assume the problem was between my chair and keyboard, but now I'm not so sure.

Also it is worth bearing in mind that many temporal users will use a GitOps tool (likely ArgoCD or FluxCD) to deploy this helm chart, so it is also something worth validating.

Cheers

max-openline commented 4 months ago

same issue is happening to me, have pods stuck on init state. I tried the workaround from @DanielCalvo but it not work on my case.

emanuel8x8 commented 2 months ago

Do you happen to have any updates on this? Our team is also affected by this issue.

smolinari commented 2 months ago

I seem to also have this issue upgrading from 0.36.0 to 0.37.0. It is my first upgrade.

Scott

brojonat commented 1 week ago

I also had this issue; after a Kubernetes update I realized my Temporal deployment wasn't in a good state. I found that the temporal-history container was waiting on this same elasticsearch index setup. Following along the lines of above, in my temporal helm chart repo I set enabled: true in my values/values.postgresql.yaml (fixing the config OP pointed out), then I did:

helm template . --values values/values.postgresql.yaml > out.yml

Then you can search for temporal-es-index-setup in that output, stick that job into it's own job.yml file and do:

kubectl apply -f job.yml

I had to kick a few of the pods to restart the deployment but then everything was working as normal.

temporalio / helm-charts