logstash-plugins / logstash-filter-grok

Grok plugin to parse unstructured (log) data into something structured.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Apache License 2.0
124 stars 98 forks source link

grok applies tag_on_failure if last pattern doesn't match #49

Open webmstr opened 9 years ago

webmstr commented 9 years ago

If you have a grok filter with multiple matches and break_on_match set to false, the event will have tag_on_failure applied unless the last pattern matches. This is a change from 1.4.

filter{
    grok {
        match => { "message" => [
            "foo",
            "bar"
         ] }
        tag_on_failure => [ "failure" ]
        break_on_match => false
    }
}

Sending in input of:

foo
bar
spam

Gives:

{
   "message" => "foo",
  "@version" => "1",
"@timestamp" => "2015-07-06T22:40:00.454Z",
      "host" => "0.0.0.0",
      "tags" => [
    [0] "failure"
]
}
{
   "message" => "bar",
  "@version" => "1",
"@timestamp" => "2015-07-06T22:40:00.456Z",
      "host" => "0.0.0.0"
}

{
   "message" => "spam",
  "@version" => "1",
"@timestamp" => "2015-07-06T22:40:00.627Z",
      "host" => "0.0.0.0",
      "tags" => [
    [0] "failure"
]
}

In 1.4, only "spam" would be tagged as "failure".

jordansissel commented 9 years ago

Based on what you've provided, I agree this is a bug.

corey-hammerton commented 7 years ago

+1 Even in v3.2.3 I experiencing this.

TheVastyDeep commented 5 years ago

@jordansissel The problem is in the inner loop in match_against_groks. The outer loop in method filter retains the state of 'matched' just fine, but the inner loop overwrites 'matched' for each pattern (because it is also using it to store the set of matches), so tag_on_failure only applies to the last pattern. Consider this:

input { generator { count => 1 lines => [ '{"foo": "DEBUG"}', '{"bar": "DEBUG"}'] } }
filter {
    json { source => "message" }
    grok {
        break_on_match => false
        match => {
            "foo" => [ "INFO", "DEBUG" ]
            "bar" => [ "DEBUG", "INFO" ]
         }
    }
}
output { stdout { } }

The second event gets a _grokparsefailure tag, but the first does not. Unexpected.