logstash-plugins / logstash-filter-kv

Apache License 2.0
17 stars 42 forks source link

use the Timeout class instead of relying on native thread interruption #84

Closed colinsurprenant closed 5 years ago

colinsurprenant commented 5 years ago

Per elastic/logstash#10976 the usage of native thread interruption for controlling timed execution seems to be creating problematic side effects.

This PR replaces the original timeout enforcer logic with the usage of the Timeout class which should behave a lot better.

yaauie commented 5 years ago

I think it could be useful to switch implementation to use jruby's Timeout, but would prefer to modify the abstraction of a TimeoutEnforcer instead of ripping it out entirely.

  1. it allows us to set up a super light-weight implementation at plugin registration instead of choosing code-paths at runtime for each and every event we process.
  2. it gives us room to change out the implementation again at a later time if we discover a better path forward, without having to also change the business logic again.

I've done so on a branch here: https://github.com/yaauie/logstash-filter-kv/commit/08f6945ce3fd922a5eba674d0fa4a03f4710fd4f


If we do chose to rip out this implementation entirely, the initialize_timeout_enforcer method and TimeoutException class would be orphaned and would need to be be ripped out.

colinsurprenant commented 5 years ago

@yaauie I see your point but I am not sure we need to abstract a simple timeout here, there is not really any complexity to abstract, it is already abstracted in the Timeout class and frankly the chances this timeout implementation changes is rather small, if anything, potential bugs or improvements will be done in the Timeout class itself. Also, I think the efficiency of creating/calling a new closure versus a code path choice with a simple if is arguable and marginal at best. I actually like the simplicity and explicitness of the new construct, but this is more style than functionality.

Good catch on the initialize_timeout_enforcer left over - I will remove it. For TimeoutException it is still used and passed to Timeout.timeout(@timeout_seconds, TimeoutException) which allows keeping all the rescue clauses intact. We should definitely keep that.

colinsurprenant commented 5 years ago

Thanks @yaauie for the review.

Note that before merging we should do some more manual sanity tests (and bump version) and report back here. /CC @jsvd

colinsurprenant commented 5 years ago

Similar fix in logstash-plugins/logstash-filter-grok#147

colinsurprenant commented 5 years ago

Performance

These are quick smoke performance tests where EPS is visually approximated using per second metrics stats.

We can see that the performance signatures are very similar so there should not be any noticeable impact, especially when overall performance will be limited by the real regex parsing performance cost (since we used extremely short string to try to factor out the actual regex parsing cost here for these tests).

The following pipeline was used for testing performance (where timeout_millis was changed).

bin/logstash -e 'input{generator{lines => ["aa", "bb", "cc", "dd"] count => 1000000}} filter{kv{timeout_millis => 0 }} output{stdout{codec => dots{}}}
colinsurprenant commented 5 years ago

v4.4.0 published.