graylog-labs / graylog-plugin-metrics-reporter

Graylog Metrics Reporter Plugins
https://www.graylog.org/
GNU General Public License v3.0
80 stars 18 forks source link

ES Indexing errors with this plugin #47

Open TotalGriffLock opened 3 years ago

TotalGriffLock commented 3 years ago

I'm using Graylog 4.11 with version 3.0.0 of the metrics-reporter-gelf plugin running to log metrics back into Graylog. I've done no plugin configuration short of

metrics_gelf_enabled = true

in server.conf.

Most metrics are being logged every 15 seconds as expected but there are obviously some that are being dumped as I have 100k of indexing failures. I've narrowed it down to this plugin by routing all messages from my gelf input into a separate index. The only thing I have generating gelf messages into that input is this plugin. The input only listens on localhost so it isn't outside interference.

Every 5 minutes I get these indexer failures:

Timestamp | Index | Letter ID | Error message -- | -- | -- | -- a few seconds ago | gelf_0 | 0786ab1e-f535-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id '0786ab1e-f535-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]]; a few seconds ago | gelf_0 | 0785c096-f535-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id '0785c096-f535-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]]; a few seconds ago | gelf_0 | fe953d41-f534-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'fe953d41-f534-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]]; a few seconds ago | gelf_0 | fe95b270-f534-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'fe95b270-f534-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]]; a few seconds ago | gelf_0 | f5aafb66-f534-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'f5aafb66-f534-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]]; a few seconds ago | gelf_0 | f5aa8648-f534-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'f5aa8648-f534-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]]; a minute ago | gelf_0 | ecb82e12-f534-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'ecb82e12-f534-11eb-8a1b-00155d366e62'. Preview of field's value: '[]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "[]"]]; a minute ago | gelf_0 | ecb87c4e-f534-11eb-8a1b-00155d366e62 | ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [value] of type [long] in document with id 'ecb87c4e-f534-11eb-8a1b-00155d366e62'. Preview of field's value: 'Wed Aug 04 15:01:52 UTC 2021']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=For input string: "Wed Aug 04 15:01:52 UTC 2021"]];

My understanding is that GL will have calculated the field types for this input based on the message content and set that as the index's template in ES. Field refresh on this index is set to 5 seconds. I assume that something is being logged with the timestamp in a field which the ES indexer has determined should be a long, and again with something which is [] into a field defined as a long. So I think this could be resolved with a static ES template for this index?

Any suggestions as to how to resolve this gratefully received.

TotalGriffLock commented 3 years ago

Here's the dynamic template generated for this index (and therefore this plugin's messages because nothing else logs to that input)

$ curl -X GET "localhost:9200/_template/gelf-template?pretty=true"

{
  "gelf-template" : {
    "order" : -1,
    "index_patterns" : [
      "gelf_*"
    ],
    "settings" : {
      "index" : {
        "analysis" : {
          "analyzer" : {
            "analyzer_keyword" : {
              "filter" : "lowercase",
              "tokenizer" : "keyword"
            }
          }
        }
      }
    },
    "mappings" : {
      "_source" : {
        "enabled" : true
      },
      "dynamic_templates" : [
        {
          "internal_fields" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string",
            "match" : "gl2_*"
          }
        },
        {
          "store_generic" : {
            "mapping" : {
              "type" : "keyword"
            },
            "match_mapping_type" : "string"
          }
        }
      ],
      "properties" : {
        "gl2_processing_timestamp" : {
          "format" : "uuuu-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "gl2_accounted_message_size" : {
          "type" : "long"
        },
        "gl2_receive_timestamp" : {
          "format" : "uuuu-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        },
        "full_message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "streams" : {
          "type" : "keyword"
        },
        "source" : {
          "fielddata" : true,
          "analyzer" : "analyzer_keyword",
          "type" : "text"
        },
        "message" : {
          "fielddata" : false,
          "analyzer" : "standard",
          "type" : "text"
        },
        "timestamp" : {
          "format" : "uuuu-MM-dd HH:mm:ss.SSS",
          "type" : "date"
        }
      }
    },
    "aliases" : { }
  }
}

The only field which is a [long] is gl2_accounted_message_size. So is this plugin causing that field to sometimes contain the timestamp or a null value?

TotalGriffLock commented 3 years ago

I have resolved this myself, via https://community.graylog.org/t/graylog-metrics-plugin-feeding-data-via-gelf-to-graylog-causing-parsing-errors/16356/3

Most of the values for metrics are numbers so Graylog/ES correctly decide to store the "value" field as a [long]. However there are 2 metrics (at the time of writing): org.graylog2.journal.oldest-segment jvm.threads.deadlocks where the value is either a string (timestamp) or a collection/array. Obviously this data will not go in a field with the type of long. The graylog community URL above provides a solution but only for 1 specific metric. I've put the GELF metrics input through a pipeline with the following rule, which has resolved the errors for me and should work as new metrics are added which are not numeric:

Rule "Cleanup: Non-numeric metrics value field" when has_field("value") AND not is_long("value") then rename_field( old_field: "value", new_field: "value_string" ); end

TotalGriffLock commented 3 years ago

That didn't appear to be working either, but this does. Can't spend any more time on it right now, but if anyone else is having the same problem this will fix it.

Rule "Cleanup: Non-numeric metrics value field" when has_field("name") AND has_field("value") AND (to_string($message.name) == "org.graylog2.journal.oldest-segment" OR to_string($message.name) == "jvm.threads.deadlocks") then let value_string = to_string($message.value); set_field ("value_string",value_string); remove_field("value"); end