logstash-plugins / logstash-filter-grok

Grok plugin to parse unstructured (log) data into something structured.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Apache License 2.0
124 stars 98 forks source link

Added: support for timeout_scope #153

Closed kares closed 4 years ago

kares commented 4 years ago

this is a new feature (off by default) meant to reduce the high cost of timeouts (follow-up on https://github.com/logstash-plugins/logstash-filter-grok/pull/147)

an attempt to address: https://github.com/logstash-plugins/logstash-filter-grok/issues/152

some things left to wrap this one up:

eventually a performance test guard (could be added later).

kares commented 4 years ago

"smoke" performance test - using 10 (simple) always failing patterns :

input {
    generator {
      lines => ["aaaaaaaaaa", "bbbbbbbbbb", "cccccccccc", "ddddddddddd", "eeeeeeeeee"]
      count => 1000000
    }
}

filter{
  grok {
    timeout_millis => 30000
    match => {
      "message" => [
        "foo1: %{NUMBER:bar}", "foo2: %{NUMBER:bar}", "foo3: %{NUMBER:bar}", "foo4: %{NUMBER:bar}", "foo5: %{NUMBER:bar}",
        "foo6: %{NUMBER:bar}", "foo7: %{NUMBER:bar}", "foo8: %{NUMBER:bar}", "foo9: %{NUMBER:bar}", "foo10: %{NUMBER:bar}"
        ]
    }
  }
}

output{ stdout { codec => dots {} } }

baseline

timeout_millis => 30000

[64,9KiB/s] [65,8KiB/s] [62,1KiB/s]

timeout_millis => 0

[78,2KiB/s] [75,6KiB/s] [73,3KiB/s]

updated plugin (from PR)

timeout_millis => 30000

[56,2KiB/s] [54,3KiB/s] [58,7KiB/s]

timeout_millis => 0

[74,6KiB/s] [75,0KiB/s] [70,2KiB/s]

timeout_millis => 30000 timeout_grouped => true

[69,8KiB/s] [66,6KiB/s] [69,1KiB/s]

colinsurprenant commented 4 years ago

Great stuff @kares - left a naming suggestion comment. I really like the TimeoutSupport abstraction. LGTM code-wise so far.

kares commented 4 years ago

based on (above smoke test) numbers - surprisingly the current code does get (~5%) slower for the timeout_millis => 30000 (guessing its either the Struct or the fast that blocks are not inlining)

jsvd commented 4 years ago

btw I used https://gist.github.com/jsvd/23dbb156904e9ba770d48bb971b6735e#file-gistfile1-txt to stress test your change and the difference is dramatic: about 20k eps with current 7.4.2, and 90k eps with this patch and timeout_grouped enabled

kares commented 4 years ago

yy - more patterns more it should improve ... that part I am happy with :1st_place_medal: just do not like that we're a bit slower for the default case - maybe its not that relevant.

jsvd commented 4 years ago

I have tested with a single pattern using https://gist.github.com/jsvd/23dbb156904e9ba770d48bb971b6735e#file-stress_single_pattern and could not see a significant difference at all

kares commented 4 years ago

~5% degradation mostly impacts timeout_grouped: false but we can advice to flip the switch! :fist_right: looked into it and it's due the additional block being passed and yielded (they do not yet inline in JRuby).

kares commented 4 years ago

with some "oop" (to avoid dummy block passes) - smoke performance now shows close to base line.

elasticsearch-bot commented 4 years ago

Karol Bucek merged this into the following branches!

Branch Commits
master 3527b14741e0374be9f4def0ac52c599437390e1, d4aac7c007bf7f3dece7ae28a6518c3ccc1ecf18, 3c5e4c54ec97479421764c7592b3f080ba4cebd7, 8fcb5f899df156758229c287dea12f7674782899, d118d93ab5d6cc2088de895e7fc2817e70c3b1c3, cd7d92eff50fe825d408f049b96e9d3699dd1900, 7a2c2122be8f785712c59b221e642f3df8fc5f30, a317df8613c1a22f3b99126f8abb468d79ded3c8