Closed kares closed 4 years ago
"smoke" performance test - using 10 (simple) always failing patterns :
input {
generator {
lines => ["aaaaaaaaaa", "bbbbbbbbbb", "cccccccccc", "ddddddddddd", "eeeeeeeeee"]
count => 1000000
}
}
filter{
grok {
timeout_millis => 30000
match => {
"message" => [
"foo1: %{NUMBER:bar}", "foo2: %{NUMBER:bar}", "foo3: %{NUMBER:bar}", "foo4: %{NUMBER:bar}", "foo5: %{NUMBER:bar}",
"foo6: %{NUMBER:bar}", "foo7: %{NUMBER:bar}", "foo8: %{NUMBER:bar}", "foo9: %{NUMBER:bar}", "foo10: %{NUMBER:bar}"
]
}
}
}
output{ stdout { codec => dots {} } }
timeout_millis => 30000
[64,9KiB/s] [65,8KiB/s] [62,1KiB/s]
timeout_millis => 0
[78,2KiB/s] [75,6KiB/s] [73,3KiB/s]
timeout_millis => 30000
[56,2KiB/s] [54,3KiB/s] [58,7KiB/s]
timeout_millis => 0
[74,6KiB/s] [75,0KiB/s] [70,2KiB/s]
timeout_millis => 30000 timeout_grouped => true
[69,8KiB/s] [66,6KiB/s] [69,1KiB/s]
Great stuff @kares - left a naming suggestion comment. I really like the TimeoutSupport abstraction. LGTM code-wise so far.
based on (above smoke test) numbers - surprisingly the current code does get (~5%) slower for the
timeout_millis => 30000 (guessing its either the Struct
or the fast that blocks are not inlining)
btw I used https://gist.github.com/jsvd/23dbb156904e9ba770d48bb971b6735e#file-gistfile1-txt
to stress test your change and the difference is dramatic: about 20k eps with current 7.4.2, and 90k eps with this patch and timeout_grouped
enabled
yy - more patterns more it should improve ... that part I am happy with :1st_place_medal: just do not like that we're a bit slower for the default case - maybe its not that relevant.
I have tested with a single pattern using https://gist.github.com/jsvd/23dbb156904e9ba770d48bb971b6735e#file-stress_single_pattern and could not see a significant difference at all
~5% degradation mostly impacts timeout_grouped: false
but we can advice to flip the switch! :fist_right:
looked into it and it's due the additional block being passed and yielded (they do not yet inline in JRuby).
with some "oop" (to avoid dummy block passes) - smoke performance now shows close to base line.
Karol Bucek merged this into the following branches!
Branch | Commits |
---|---|
master | 3527b14741e0374be9f4def0ac52c599437390e1, d4aac7c007bf7f3dece7ae28a6518c3ccc1ecf18, 3c5e4c54ec97479421764c7592b3f080ba4cebd7, 8fcb5f899df156758229c287dea12f7674782899, d118d93ab5d6cc2088de895e7fc2817e70c3b1c3, cd7d92eff50fe825d408f049b96e9d3699dd1900, 7a2c2122be8f785712c59b221e642f3df8fc5f30, a317df8613c1a22f3b99126f8abb468d79ded3c8 |
this is a new feature (off by default) meant to reduce the high cost of timeouts (follow-up on https://github.com/logstash-plugins/logstash-filter-grok/pull/147)
an attempt to address: https://github.com/logstash-plugins/logstash-filter-grok/issues/152
some things left to wrap this one up:
timeout_grouped
sound okay?eventually a performance test guard (could be added later).