When I implemented split-by-pattern, my implementation introduced a significant performance regression.
Essentially, a negated character class is significantly faster than the way in which I constructed a negated pattern-match (lookahead + single character).
This PR ensures we use negated character classes wherever practical, and continues to use the less-performant method mentioned above only when doing work that cannot be done with negated character classes.
Using the input and config from #70 and the expressions generated by the plugin, I'm able to show that this PR brings performance of the generated expressions back to pre-regression status:
╭─{ yaauie@castrovel:~/src/elastic/logstash-plugins/logstash-filter-kv (✔ performance-regression-fix) }
╰─● ruby benchmark.ips.ruby
Warming up --------------------------------------
4.0.3 2.041k i/100ms
4.2.0 1.269k i/100ms
negated-charclass 2.118k i/100ms
Calculating -------------------------------------
4.0.3 21.505k (± 3.7%) i/s - 1.290M in 60.070405s
4.2.0 12.933k (± 4.6%) i/s - 774.090k in 60.014354s
negated-charclass 21.703k (± 4.0%) i/s - 1.300M in 60.031555s
Comparison:
negated-charclass: 21703.0 i/s
4.0.3: 21505.1 i/s - same-ish: difference falls within error
4.2.0: 12932.8 i/s - 1.68x slower
ruby benchmark.ips.ruby 221.12s user 1.34s system 104% cpu 3:32.68 total
[success (212.000s)]
Resolves: #70
When I implemented split-by-pattern, my implementation introduced a significant performance regression.
Essentially, a negated character class is significantly faster than the way in which I constructed a negated pattern-match (lookahead + single character).
This PR ensures we use negated character classes wherever practical, and continues to use the less-performant method mentioned above only when doing work that cannot be done with negated character classes.
Using the input and config from #70 and the expressions generated by the plugin, I'm able to show that this PR brings performance of the generated expressions back to pre-regression status: