logstash-plugins / logstash-filter-kv

Apache License 2.0
17 stars 42 forks source link

fix performance regression when using `field_split` and `value_split` char classes #71

Closed yaauie closed 6 years ago

yaauie commented 6 years ago

Resolves: #70

When I implemented split-by-pattern, my implementation introduced a significant performance regression.

Essentially, a negated character class is significantly faster than the way in which I constructed a negated pattern-match (lookahead + single character).

This PR ensures we use negated character classes wherever practical, and continues to use the less-performant method mentioned above only when doing work that cannot be done with negated character classes.

Using the input and config from #70 and the expressions generated by the plugin, I'm able to show that this PR brings performance of the generated expressions back to pre-regression status:

╭─{ yaauie@castrovel:~/src/elastic/logstash-plugins/logstash-filter-kv (✔ performance-regression-fix) }
╰─● ruby benchmark.ips.ruby
Warming up --------------------------------------
               4.0.3     2.041k i/100ms
               4.2.0     1.269k i/100ms
   negated-charclass     2.118k i/100ms
Calculating -------------------------------------
               4.0.3     21.505k (± 3.7%) i/s -      1.290M in  60.070405s
               4.2.0     12.933k (± 4.6%) i/s -    774.090k in  60.014354s
   negated-charclass     21.703k (± 4.0%) i/s -      1.300M in  60.031555s

Comparison:
   negated-charclass:    21703.0 i/s
               4.0.3:    21505.1 i/s - same-ish: difference falls within error
               4.2.0:    12932.8 i/s - 1.68x  slower

ruby benchmark.ips.ruby  221.12s user 1.34s system 104% cpu 3:32.68 total
[success (212.000s)]                                                        
colinsurprenant commented 6 years ago

Good strategy! LGTM.