ILM Keeps on Overwriting/Updating the Index Lifecycle Policy every minute

carlcauchi commented 3 years ago

Problem

When the option '"ilm_policy_overwrite" = true', I'm noticing from elasticsearch that every 60secs, it tries to update the ILM Policy even though there would be not changes and this is resulting into bumping up the version number every 1 minute for the lifecycle policy on elasticsearch side.

This is the informational msg I can see every 60secs on the elasticsearch node:

{"type": "server", "timestamp": "2021-04-14T20:56:37,064Z", "level": "INFO", "component": "o.e.x.i.a.TransportPutLifecycleAction", "cluster.name": "docker-cluster", "node.name": "elasticsearch-844c684c58-hkjl8", "message": "updating index lifecycle policy [k8s-lifecycle-policy]", "cluster.uuid": "YKxZinsuTiqg6PJ1RqE6QA", "node.id": "nSJnQ_y1QvC3b1e1W-cXVw" }

Steps to replicate

    "enable_ilm" = true
    "ilm_policy_id" = "k8s-lifecycle-policy"
    "ilm_policy" = "{\"policy\": {\"phases\": {\"hot\": {\"min_age\": \"0ms\",\"actions\": {\"rollover\": {\"max_age\": \"30d\",\"max_size\": \"50gb\"},\"set_priority\": {\"priority\": 100}}},\"delete\": {\"min_age\": \"30d\",\"actions\": {\"delete\": {\"delete_searchable_snapshot\": true}}}}}}"
    "ilm_policy_overwrite" = true

Expected Behavior or What you need to ask

The lifecycle policy should only be updated/triggered once there is a change in the CRD configuration and not every 60secs.

@cosmo0920 @gihad maybe you can help out on this one

cosmo0920 commented 3 years ago

When the option '"ilm_policy_overwrite" = true', I'm noticing from elasticsearch that every 60secs, it tries to update the ILM Policy even though there would be not changes and this is resulting into bumping up the version number every 1 minute for the lifecycle policy on elasticsearch side.

This could be happened during flushing. Did you use flush_interval 60s(default value)?

The lifecycle policy should only be updated/triggered once there is a change in the CRD configuration and not every 60secs.

Hmm, I'm not against for adding this mechanism. But, your use case is enough to specify ilm_policy_overwrite false, right? There is another requirements for your use case?

carlcauchi commented 3 years ago

@cosmo0920 I can confirm that when setting ilm_policy_overwrite false this stops happening / constantly updating the lifecycle policy. My concern with leaving this option as false, when changing any configuration, it wouldn't then apply the updates.

The option flush_interval 60s is not set. So I guess this defaults to the 60secs. Should this relate to the updating of lifecycle even though there is no changes applied to it?

carlcauchi commented 3 years ago

@cosmo0920 this is a full extract of the configuration I'm using:


"spec" = {
      "elasticsearch" = {
        "host" = "elasticsearch.elk"
        #"hosts" = join(",", data.aws_instances.elasticsearch_instances.private_ips)
        "index_name" = "k8s"
        "utc_index" = true
        "logstash_dateformat" = "%Y-%m-%d"
        "logstash_format" = true
        "logstash_prefix" = "k8s-${replace(local.full_env_name, ".", "-")}"
        "include_timestamp" = true
        "password" = {
          "valueFrom" = {
            "secretKeyRef" = {
              "key" = "password"
              "name" = kubernetes_secret.banzaicloud-elasticsearch-auth.metadata[0].name
            }
          }
        }
        "port" = var.logging_elasticsearch_port
        "scheme" = var.logging_elasticsearch_scheme
        "user" = data.external.env.result.ELASTICSEARCH_BANZAICLOUD_USER

        "buffer" = {
          "timekey" = "1m"
          "timekey_wait" = "30s"
          "timekey_use_utc" = true
        }

        "fail_on_putting_template_retry_exceed" = true
        "reconnect_on_error" = true
        "reload_on_failure" = true
        "verify_es_version_at_startup" = true
        "default_elasticsearch_version" = "7"
        "suppress_type_name" = true

        "enable_ilm" = true
        "ilm_policy_id" = "k8s-lifecycle-policy"
        "ilm_policy" = "{\"policy\": {\"phases\": {\"hot\": {\"min_age\": \"0ms\",\"actions\": {\"rollover\": {\"max_age\": \"30d\",\"max_size\": \"50gb\"},\"set_priority\": {\"priority\": 100}}},\"delete\": {\"min_age\": \"30d\",\"actions\": {\"delete\": {\"delete_searchable_snapshot\": true}}}}}}"
        "ilm_policy_overwrite" = true

        "template_name" = "k8s-index-template"
        "template_overwrite" = true

        "template_file" = {
          "mountFrom" = {
            "secretKeyRef" = {
              "key" = "index_template.json"
              "name" = kubernetes_secret.banzaicloud-elasticsearch-indextemplate.metadata[0].name
            }
          }
        }

      }
    }

carlcauchi commented 3 years ago

and this is the json file for the index template:

{
    "order": 0,
    "index_patterns": [
      "k8s-*"
    ],
    "settings": {
      "index": {
        "lifecycle": {
          "name": "k8s-lifecycle-policy"
        },
        "number_of_shards": "1",
        "number_of_replicas": "1"
      }
    },
    "mappings": {
      "dynamic": true,
      "numeric_detection": false,
      "date_detection": true,
      "dynamic_date_formats": [
        "strict_date_optional_time",
        "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
      ],
      "_source": {
        "enabled": true,
        "includes": [],
        "excludes": []
      },
      "_routing": {
        "required": false
      },
      "dynamic_templates": []
    }
  }

cosmo0920 commented 3 years ago

My concern with leaving this option as false, when changing any configuration, it wouldn't then apply the updates.

Currently, Elasticsearch plugin doesn't concern policy changes. Just pass-through into Elasticsearch cluster.

this is a full extract of the configuration I'm using:

This is should be in DevOps settings such as Puppet? If you use Puppet or Chef to manage Fluentd configuration, you should restart Fluentd instances when configuration is changed. I think that introduce policy changing detection mechanism is too heavy for handling your usecase.

I guess implement policy changing detection would be the last resort.

cosmo0920 commented 3 years ago

    "buffer" = {
     "timekey" = "1m"
     "timekey_wait" = "30s"
     "timekey_use_utc" = true
   }

Aｎd your setting requests Elasticsearch plugin to do time sliced flushing for every 1minutes(60 seconds). That's why Elasticsearch plugin should sent Elasticsearch policy in every 60 seconds.

uken / fluent-plugin-elasticsearch