magnusbaeck / logstash-filter-verifier

Apache License 2.0
192 stars 27 forks source link

grokparser failure when run with logstash-filter-verifier #64

Closed nagaraj151 closed 5 years ago

nagaraj151 commented 5 years ago

Hello,

Newbie to logstash-filter-verifier.

We are trying to verify input against a filter which uses the following grok (?<messagetimestamp>%{TIMESTAMP_ISO8601} %{INT}) %{DATA:type_of_log}: %{GREEDYDATA:messagebody}

logstash-filter-verifier seems to to tag the input with grokparsefailure.

Turning on the logstash-output we noticed that the expanded_pattern seems to add an extra backslash which might be the reason for failure as the original pattern works well with the input using grok debugger but not the expanded pattern

[2018-12-11T00:14:50,102][DEBUG][logstash.filters.grok ] Grok compiled OK {:pattern=>"(?<messagetimestamp>%{TIMESTAMP_ISO8601} %{INT}) %{DATA:type_of_log}: %{GREEDYDATA:messagebody}", :expanded_pattern=>"(?<messagetimestamp>(?:(?:(?>\\d\\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?) (?:(?:[+-]?(?:[0-9]+)))) (?<DATA:type_of_log>.*?): (?<GREEDYDATA:messagebody>.*)"}

The sample log we were testing was 2018-10-25 19:39:56 +0000 mylogfile: {\"host\":\"190.168.1.1\"}

We have tried with both 1.5.0 and 1.4.2 version of logstash-filter-verifier.

Appreciate any feedback. Nagaraj

magnusbaeck commented 5 years ago

Please provide the Logstash filter file you're testing and the corresponding testcase file.

nagaraj151 commented 5 years ago

filter { grok { match => { "log" => "(?<messagetimestamp>%{TIMESTAMP_ISO8601} %{INT}) %{DATA:type_of_log}: %{GREEDYDATA:messagebody}" } } if "_grokparsefailure" in [tags] { mutate { rename => { "log" => "messagebody" } remove_tag => "_grokparsefailure" } } else { mutate { remove_field => [ "log" ] } if [messagebody] =~ "\A\{.+\}\z" { json { skip_on_invalid_json => true source => "messagebody" remove_field => [ "messagebody" ] } } date { match => [ "messagetimestamp", "yyyy-MM-dd HH:mm:ss Z" ] } mutate { replace => {"messagetimestamp" => "%{@timestamp}"} } } } { "ignore": [ "host", "@timestamp" ], "testcases": [ { "input": [ "2018-10-25 19:39:56 +0000 mylogfile: {\"host\":\"190.168.1.1\"}} " ], "expected": [ { "messagebody": "{\\\"host\\\":\\\"190.168.1.1\\\"}", "type_of_log": "mylogfile", "messagetimestamp": "2018-10-25 19:39:56 +0000" } ] } ] }

magnusbaeck commented 5 years ago

Your grok filter is matching the expression against the contents of the log field, but the input events don't have any field by that name. The input string will end up in the message field. If your actual data really has a log field you need to adjust your testcase file to use the json_lines codec and turn the input string into a JSON string:

{
  ...,
  "codec": "json_lines",
  "input": [
    "{\"log\": \"2018-10-25 19:39:56 +0000 ...\"}"
  ],
  ...
}
magnusbaeck commented 5 years ago

No further feedback from reporter so assuming problem was resolved or otherwise no longer relevant.