jaegertracing / helm-charts

Helm Charts for Jaeger backend
Apache License 2.0
254 stars 338 forks source link

[Feature]: set the 'number_of_replicas' in elasticsearch to a variable #533

Open Stevenpc3 opened 6 months ago

Stevenpc3 commented 6 months ago

Requirement

As a user that uses Elasticsearch as the data store I deploy elastic in single mode (or multiple replicas) the default index created by Jaeger should match the number of replicas that I am using.

Problem

If I deploy elastic with a single replica, Jaeger will create an index with the (either the default or intentionally, but not sure) number_of_replicas = 1 which causes issues in the elasticsearch instance.

What happens is the index will have be created fine, but if the elasticsearch pod comes down then when it comes back up it will stay "yellow" and not go ready because the index waits for the second replica. This can be fixed by setting the number_of_replicas = 0 manually or via template.

The values.yaml "replicas" for elasticsearch was set to 1

elasticsearch:
    imageTag: "7.17.3"
    replicas: 1
    fullnameOverride: "jaeger-elasticsearch"

Using the following modifications to teh values.yaml works to set the template correctly. Specifically the section

lifecycle:
      postStart:
        exec:
          command:
jaeger:
  # -- enable or disable Jaeger
  enabled: true
  # -- version of Jaeger to use
  #tag:
  # -- Set the storage type to use for long term storage
  storage:
    type: elasticsearch
    elasticsearch:
      # make this a template that decides based on devMode and can configure properly
      host: "jaeger-elasticsearch"
      usePassword: false
      antiAffinity: "soft"

  # -- Preferred long term backend storage
  elasticsearch:
    imageTag: "7.17.3"
    replicas: 1
    fullnameOverride: "jaeger-elasticsearch"
    esConfig:
      elasticsearch.yml: |
        ingest.geoip.downloader.enabled: false
    lifecycle:
      postStart:
        exec:
          command:
            - bash
            - -c
            - |
              #!/bin/bash
              # Add a template to adjust number of shards/replicas
              TEMPLATE_NAME=no_replicas
              INDEX_PATTERN1="jaeger-span-*"
              INDEX_PATTERN2="jaeger-service-*"
              ES_URL=http://localhost:9200
              while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
              curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN1"\"','\""$INDEX_PATTERN2"\"'],"settings":{"number_of_replicas":"0"}}'

further examples of what happens can be found at

https://medium.com/fred-thougths/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode-ce196e20ba95 https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

Proposal

Ensure the the number of shards created in the index is equal to the replicas - 1.

Open questions

No response

Stevenpc3 commented 2 months ago

I have found a more simple way to set this instead of the lifecycle hack on start...

collector:
  cmdlineParams:
    es.num-replicas: "0"

I am sure I can template that "0" to be a value I require if more since we use a wrapper chart, but this does not seem straight forward when using out of the box chart.

Stevenpc3 commented 1 month ago

This problem also exists with Spark. It will default to setting replica count to 1 and then in kuberentes it will start as Yellow in elasticsearch and then fail to come up. You have to delete the replica by setting it to 1 via exec in and curl with

curl -XPUT -H 'Content-Type: application/json' 'localhost:9200/jaeger-dependencies-2024-05-11/_settings' -d '{"index.number_of_replicas" : 0}'

Replace the index with the proper value.

I will test with the same cmdlineParams setting and hope it fixes it...

Stevenpc3 commented 1 month ago

So cmdlineParams for spark does not work. The only thing that did work is setting a hook to adjust the replicas on start if using a single master.

master:
      masterOnly: false
      replicaCount: 1
      lifecycleHooks:
        postStart:
          exec:
            command:
              - bash
              - -c
              - |
                #!/bin/bash
                # Add a template to adjust number of shards/replicas
                TEMPLATE_NAME=no_replicas
                # INDEX_PATTERN1="jaeger-span-*"
                # INDEX_PATTERN2="jaeger-service-*"
                INDEX_PATTERN1="jaeger-dependencies-*"
                ES_URL=http://localhost:9200
                while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
                curl -XPUT "$ES_URL/_index_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN1"\"'],"template":{"settings":{"number_of_replicas":"0"}}}'

which in the new charts you can no longer set the jaeger-span or jaeger-service this way as it throws errors so in total you need to set the hook for spark and the cmdlineParams for the collector.