uken / fluent-plugin-elasticsearch

Apache License 2.0
891 stars 310 forks source link

Logs aren't delivered in elasticsearch using elasticsearch_data_stream type #925

Closed lololozhkin closed 2 years ago

lololozhkin commented 2 years ago

Problem

I am not able to use elasticsearch_data_stream type in fluentd, but this feature is very useful in my case. In version 5.1.0 i was able to use this type, but data_stream_ilm_name was added to 5.1.1 version and I want to use this parameter.

My logs aren't delivered to elasticsearch. Every minute I get this log message:

#0 got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `include?' for :data_stream_name:Symbol"

Stacktrace:

2021-10-27 06:17:42 +0000 [warn]: #0 got unrecoverable error in primary and no secondary error_class=NoMethodError error="undefined method `include?' for :data_stream_name:Symbol"
2021-10-27 06:17:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.1/lib/fluent/plugin/output.rb:777:in `extract_placeholders'
2021-10-27 06:17:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-elasticsearch-5.1.1/lib/fluent/plugin/out_elasticsearch_data_stream.rb:195:in `write'
2021-10-27 06:17:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.1/lib/fluent/plugin/output.rb:1178:in `try_flush'
2021-10-27 06:17:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.1/lib/fluent/plugin/output.rb:1490:in `flush_thread_run'
2021-10-27 06:17:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.1/lib/fluent/plugin/output.rb:498:in `block (2 levels) in start'
2021-10-27 06:17:42 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Steps to replicate

My config is here: Output section:

    <match **>
      @type elasticsearch_data_stream
      host "elasticsearch-master"
      port 9200
      suppress_type_name true
      data_stream_name "fluentd.${$.kubernetes.container_name}"
      data_stream_ilm_name "my-ilm"
      <buffer tag, $.kubernetes.container_name>
        flush_thread_count 5
        @type "file"
        path "/var/log/fluentd/buffer/"
        chunk_limit_size 10m
        total_limit_size 10g
        flush_mode interval
        flush_interval 1m
        overflow_action drop_oldest_chunk
        retry_type exponential_backoff
        retry_wait 5s
        retry_max_interval 60s
        retry_randomize true
        retry_forever true
      </buffer>
    </match>

Filter section:

      <filter kubernetes.**>
        @type                        parser
        @id                          try_parse_json_logs
        key_name                     log
        replace_invalid_sequence     true
        emit_invalid_record_to_error false
        reserve_data                 true
        reserve_time                 true
        remove_key_name_field        false
        hash_value_field             data

        <parse>
          @type multi_format
          <pattern>
            format      json
            json_parser json
            time_key    timestamp
          </pattern>
          <pattern>
            format      none
            message_key message
          </pattern>
        </parse>
      </filter>

      <filter kubernetes.**>
        @type                   kubernetes_metadata
        @id                     filter_kube_metadata
        skip_labels             false
        skip_container_metadata true
        skip_namespace_metadata true
        skip_master_url         true
        cache_size              10000
      </filter>

Expected Behavior or What you need to ask

My logs are pushed to elasticsearch datastream with specified ILM name.

Using Fluentd and ES plugin versions

addressable (2.8.0) bigdecimal (default: 1.4.1) bundler (2.2.24, default: 1.17.2) cmath (default: 1.0.0) concurrent-ruby (1.1.9) cool.io (1.7.1) csv (default: 3.0.9) date (default: 2.0.0) dbm (default: 1.0.0) domain_name (0.5.20190701) e2mmap (default: 0.1.0) elasticsearch (7.15.0) elasticsearch-api (7.15.0) elasticsearch-transport (7.15.0) elasticsearch-xpack (7.15.0) etc (default: 1.0.1) excon (0.87.0) faraday (1.8.0) faraday-em_http (1.0.0) faraday-em_synchrony (1.0.0) faraday-excon (1.1.0) faraday-httpclient (1.0.1) faraday-net_http (1.0.1) faraday-net_http_persistent (1.2.0) faraday-patron (1.0.0) faraday-rack (1.0.0) fcntl (default: 1.0.0) ffi (1.15.4) ffi-compiler (1.0.1) fiddle (default: 1.0.0) fileutils (default: 1.1.0) fluent-config-regexp-type (1.0.0) fluent-plugin-concat (2.5.0) fluent-plugin-dedot_filter (1.0.0) fluent-plugin-detect-exceptions (0.0.14) fluent-plugin-elasticsearch (5.1.1) fluent-plugin-grok-parser (2.6.2) fluent-plugin-json-in-json-2 (1.0.2) fluent-plugin-kubernetes_metadata_filter (2.9.1) fluent-plugin-multi-format-parser (1.0.0) fluent-plugin-parser-cri (0.1.1) fluent-plugin-prometheus (2.0.2) fluent-plugin-record-modifier (2.1.0) fluent-plugin-rewrite-tag-filter (2.4.0) fluent-plugin-systemd (1.0.5) fluentd (1.14.1) forwardable (default: 1.2.0) gdbm (default: 2.0.0) http (4.4.1) http-accept (1.7.0) http-cookie (1.0.4) http-form_data (2.3.0) http-parser (1.2.3) http_parser.rb (0.7.0) io-console (default: 0.4.7) ipaddr (default: 1.2.2) irb (default: 1.0.0) json (default: 2.1.0) jsonpath (1.1.0) kubeclient (4.9.2) logger (default: 1.3.0) lru_redux (1.1.0) matrix (default: 0.1.0) mime-types (3.3.1) mime-types-data (3.2021.0901) msgpack (1.4.2) multi_json (1.15.0) multipart-post (2.1.1) mutex_m (default: 0.1.0) netrc (0.11.0) oj (3.11.0) openssl (default: 2.1.2) ostruct (default: 0.1.0) prime (default: 0.1.0) prometheus-client (2.1.0) psych (default: 3.1.0) public_suffix (4.0.6) rake (13.0.6) rdoc (default: 6.1.2.1) recursive-open-struct (1.1.3) rest-client (2.1.0) rexml (default: 3.1.9.1) rss (default: 0.2.7) ruby2_keywords (0.0.5) scanf (default: 1.0.0) sdbm (default: 1.0.0) serverengine (2.2.4) shell (default: 0.7) sigdump (0.2.4) stringio (default: 0.0.2) strptime (0.2.5) strscan (default: 1.0.0) sync (default: 0.5.0) systemd-journal (1.4.2) thwait (default: 0.1.0) tracer (default: 0.1.0) tzinfo (2.0.4) tzinfo-data (1.2021.2) unf (0.1.4) unf_ext (0.0.8) webrick (1.7.0, default: 1.4.4) yajl-ruby (1.4.1) zlib (default: 1.0.0)


* ES version 7.15.0
fiscafusca commented 2 years ago

Hi @lololozhkin, thank you for bringing this out. I am currently looking for a solution, since it's affecting my work as well (and I wrote this code, so it's my responsibility). It's related to the default value assigned to the config parameters, and comes out when either data_stream_ilm_name or data_stream_template_name are not manually set. For the time being, you can bypass the error by setting the data_stream_template_name parameter as well. When all the parameters are set, the data stream is created correctly and receives the logs. Sorry for the inconvenience, fixes incoming!

lololozhkin commented 2 years ago

Hi @fiscafusca, thank you for reply! Your temporary solution works, but I have some questions.

Why do we need data_stream_ilm_name parameter in case, when we specify data_stream_template_name? _index_template may contain a definition of ilm_name itself. And even if ilm_name is not specified in _index_template, data_stream_ilm_name is not set to template.

Cold we specify some _component_template's in fluentd configuration, that will be applied to created datastream? This feature will be very useful. Now, when data_stream_ilm_name and data_stream_template_name aren't specified new datastream created with satisfying index_template. Index template has the field composed_of which contains list of _component_templates to be merged. And it would be great if we could set this _component_template's in fluentd configurations. An example of composed_of is shown in elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/master/set-up-a-data-stream.html

Waiting for updates, have a nice day!

fiscafusca commented 2 years ago

Hi @lololozhkin!

The data_stream_ilm_name parameter was just an addition in case the user doesn't have a template to specify, but wants to apply an existing ILM policy. It's not one of the most likely cases, but I wanted to keep it into account. Once the bug is fixed, one can simply set the template name without the policy, and the plugin will check whether an ILM policy is already specified in the template settings.

Unluckily, component templates apparently are not supported yet on the plugin side (I don't think you can even set component templates when creating an index in fluentd configuration). Since in any case you need an index template for your data stream, I am using an existing index template on Elastic with its component templates already set, and this works for me. However, it would be an interesting feature to ask the maintainers about! I just contributed on this matter because I needed to specify templates and ILM policies on fluentd configuration, without having dozens of default templates/policies created by the plugin and left unused.

I will mention this issue as soon as I open the PR with the bug fix! Have a nice day!

applike-ss commented 2 years ago

I am trying to use elasticsearch data streams with fluentd and got into the same issue. Also tried the 5.1.2 version, but i do get this error and am unsure if just the fix isn't complete or i am having an issue with my setup:

2021-11-15 08:45:11 +0000 [info]: #0 Specified data stream does not exist. Will be created: <[404] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [stream-my-fancy-stream-name]","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"stream-my-fancy-stream-name","index":"stream-my-fancy-stream-name"}],"type":"index_not_found_exception","reason":"no such index [stream-my-fancy-stream-name]","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"stream-my-fancy-stream-name","index":"stream-my-fancy-stream-name"},"status":404}>

My assumption is that i don't need any resource to exist and that fluentd would create everything needed to write logs via the data stream. Is that incorrect?

This is how my config looks like:

<store>
  @log_level debug
  <buffer tag>
    @type file
    chunk_limit_size 5M
    flush_at_shutdown true
    flush_interval 5s
    flush_mode interval
    flush_thread_count 8
    overflow_action drop_oldest_chunk
    path /fluentd/log/elastic-buffer
    retry_max_interval 30
    retry_max_times 100
    retry_timeout 1h
    total_limit_size 512M
  </buffer>
  @type elasticsearch_data_stream
  host "elasticsearch-master"
  port 9200
  data_stream_name stream-${tag}
  data_stream_template_name stream-${tag}
  reload_connections false
  reconnect_on_error true
  reload_on_failure true
  flush_mode interval
  flush_interval 5s
  suppress_type_name true
</store>

Or do i need to create the index alias (and the first writing index) myself?

Maybe a quick setup guide in the README.md would help others too.

fiscafusca commented 2 years ago

Hi @applike-ss, That log is not an error and is not related to this issue, since data stream creation works exactly as it did before this feature update. It's just telling you that you specified the name of a data stream that does not exist, therefore a data stream with that name will be created.

applike-ss commented 2 years ago

@fiscafusca but in fact when i do check that, i see that neither the stream nor the index lifecycle policy is being created. I do see these logs:

Could not communicate to Elasticsearch, resetting connection and trying again. [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"no matching index template found for data stream [stream-my-fancy-stream-name]"}],"type":"illegal_argument_exception","reason":"no matching index template found for data stream [stream-my-fancy-stream-name]"},"status":400}

That is why i assume that i might need to create something (index/index tamplate/index alias) myself.

applike-ss commented 2 years ago

I just looked at the code and it seems that for dynamic data stream names it tries to create resources in a different order (unless me as a non-ruby guys reads it wrong). This is the code for static stream names:

unless @use_placeholder
        begin
          @data_stream_names = [@data_stream_name]
          create_ilm_policy(@data_stream_name, @data_stream_template_name, @data_stream_ilm_name, @host)
          create_index_template(@data_stream_name, @data_stream_template_name, @data_stream_ilm_name, @host)
          create_data_stream(@data_stream_name)
        rescue => e
          raise Fluent::ConfigError, "Failed to create data stream: <#{@data_stream_name}> #{e.message}"
        end
      end

from: https://github.com/uken/fluent-plugin-elasticsearch/blob/master/lib/fluent/plugin/out_elasticsearch_data_stream.rb#L42

And here the part that i assume is for dynamic data stream names:

unless @data_stream_names.include?(data_stream_name)
          begin
            create_data_stream(data_stream_name)
            create_ilm_policy(data_stream_name, data_stream_template_name, data_stream_ilm_name, host)
            create_index_template(data_stream_name, data_stream_template_name, data_stream_ilm_name, host)
            @data_stream_names << data_stream_name
          rescue => e
            raise Fluent::ConfigError, "Failed to create data stream: <#{data_stream_name}> #{e.message}"
          end
        end

https://github.com/uken/fluent-plugin-elasticsearch/blob/master/lib/fluent/plugin/out_elasticsearch_data_stream.rb#L200

Elasticsearch complains that it can't find a index template for the data stream which would make sense to me if it tries to create the data stream before the index template and the index lifecycle policy. Am i wrong here?

fiscafusca commented 2 years ago

@applike-ss, thank you for the detailed description, now I was able to reproduce your issue. I think you may be right, the order of the function calls for dynamic data streams is incorrect, and probably at the root of this problem. I will open a pull request to have this oversight fixed ASAP.

applike-ss commented 2 years ago

That's great, i actually am about to try to fix it myself too and would've opened a PR then :see_no_evil:

cosmo0920 commented 2 years ago

928 is merged. Closing.