PERFORMANCE: Fix slow metric invocation and needless locking on timeout enforcer

original-brownbear commented 6 years ago

This is a more than a 100% speedup for me relative to master when testing against the test configuration posted here https://github.com/elastic/logstash/issues/8560#issue-270112018.

I know Travis will probably go red on this one, but it's not broken functionally :) see below:

Fixed slow metric API use (see https://github.com/elastic/logstash/pull/7919), this is the biggest part of the speedup and makes execution twice as fast right off the bat (makes Grok incompatible with versions before https://github.com/elastic/logstash/pull/7941 though :( ... I think that means before 5.6.2 but have to check later) If we don't want to break BwC I could work around things (just use a per-thread counter for the metrics and increment the metrics every now and then works as well, just more code)
Fixed the whole thread map to be lock-free by using a concurrent hashmap (~20% speedup from this on the test config)
Removed yield(/dynamic block creation) from the timeout grok method and just used an outright method call there (mostly saving memory here, not much throughput gain in my tests)
Actually saved calls to java.lang.System.nanoTime :), now was unused
Inlined away success (trivial but in JRuby this actually saves memory :D)

jakelandis commented 6 years ago

I think you can also get some (i think) easy performance boost by using a thread pool here (or executor service) instead of creating a new thread every time.

original-brownbear commented 6 years ago

@jakelandis but I'm only creating that thread once when starting Grok? I'm not creating any new threads periodically as far as I can see.

jakelandis commented 6 years ago

Ahh nevermind, i was mis-understanding how this worked. (also why I shouldn't do the review for this change :) )

original-brownbear commented 6 years ago

@andrewvc how about this one https://github.com/logstash-plugins/logstash-filter-grok/pull/125/commits/1601d93a5493af4928193ed61ca1d5c0f315d62c ?

This actually seems to be sufficient to make things safe (enough) without locking in non-timeout operation:

By using the return of stop_thread_groking to determine whether or not there was actually a grokking in progress when I called the method (this is safe because remove is atomic on the concurrent hashmap) I avoid killing a thread that is currently doing some non-grok filter/output action accidentally
By locking the interrupt handling I ensure that I won't return from a grok_till_timeout call unless I know that the timeout enforcer has finished it's interrupt call (note that it will only go into the branch containing the interrupt before if stop_thread_groking(thread) is called in the worker thread)

Let me know what you think, this change seems to actually make things faster (in theory, benchmarks didn't change for me) since we save some redundant java.lang.Thread.interrupted and accidental thread.interrupt() calls.

Let me know what you think :)

jordansissel commented 6 years ago

@original-brownbear not to derail, but I'm open to having a timeout feature in the Grok library itself if you think that is more appropriate than having Logstash wrap Grok with this interruption feature.

original-brownbear commented 6 years ago

@jordansissel hmm I don't know enough about the Grok build to answer that I fear :( but :) ...

assuming we can get some Java code compiled in the Grok build we could probably build a somewhat faster and much, much cleaner solution like by simply wrapping the joni regex library on the Java end so that we get:

  def execute(text)
    @regexp.match(text, timeout)
  end

which has the handling of the enforcer thread and interrupting all in Java (should be a visible gain memory wise to save at least on Ruby thread + we can implement some of that interrupting logic a little faster in Java potentially). More importantly it would make what @andrewvc suggested (making the timeout logic not run if configured with timeout 0) much cleaner as well (imo). => I'm all for it, not sure we want to do that in this iteration though, this seemed urgent?

jordansissel commented 6 years ago

we could probably build a somewhat faster and much, much cleaner solution like by simply wrapping the joni regex library on the Java end

Yep!

I'm all for it, not sure we want to do that in this iteration though

Agreed. We don't necessarily need to do this now.

original-brownbear commented 6 years ago

Build failure is a preexisting condition btw: https://travis-ci.org/logstash-plugins/logstash-filter-grok/jobs/296953878#L918

marioplumbarius commented 6 years ago

I rerun the benchmarks for this new patch: https://github.com/elastic/logstash/issues/8560#issuecomment-344582851

andrewvc commented 6 years ago

I'm ok with this if we add an explanatory comment.

On Nov 22, 2017 6:12 AM, "Armin Braun" notifications@github.com wrote:

@original-brownbear commented on this pull request.

In lib/logstash/filters/grok/timeout_enforcer.rb https://github.com/logstash-plugins/logstash-filter-grok/pull/125#discussion_r152545854 :

now = java.lang.System.nanoTime # save ourselves some nanotime calls

elapsed = java.lang.System.nanoTime - start_time

if elapsed > @timeout_nanos

elapsed_millis = elapsed / 1000

thread.interrupt()

Ensure that we never attempt to cancel this thread twice in the event

of weird races

stop_thread_groking(thread)

now = java.lang.System.nanoTime # save ourselves some nanotime calls

@threads_to_start_time.entry_set.each do |entry|

start_time = entry.get_value

if start_time < now && now - start_time > @timeout_nanos

thread = entry.get_key

Ensure that we never attempt to cancel this thread twice in the event

of weird races

if stop_thread_groking(thread)

@andrewvc https://github.com/andrewvc you're not, but I think this is a risk we can cleanly take, because with the latest changes we will never interrupt a thread that isn't grokking (sorry for failing to comment on this more explicitly). Think about the orders of magnitude here:

The cancel thread will do an iteration over a very small number of entries, some numeric comparisons and (if it indeed does have to interrupt some stuff) a volatile write.

In order to kill the wrong grok action, the the filter + output + poll + from queue + start new grokking action in the worker would have to be faster than a numeric comparison + ConcurrentHashMap#delete here. I think that's close to theoretically impossible and practically impossible. I can see how you could make an argument for this happening by virtue of some freak scheduling decision in the kernel, but practically it's impossible imo. Even if there is a theoretical chance (and this is easily less than 1:1M imo looking at the number of cycles the worker has to take for this to happen vs. the number of cycles the enforcer thread uses up), the effect is at worst wrongfully cancelling a grok action.

I can add a sequence number to this to make it 100% safe (that was my initial plan I explained above), but imo it wasn't worth it performance (small hit, but a hit) + complexity wise ( another 10 lines).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-grok/pull/125#discussion_r152545854, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIBYy73bZOIcuHgeOLSVZFlLLPIRvLbks5s5A-hgaJpZM4QSdlr .

original-brownbear commented 6 years ago

@andrewvc added some documentation on it here https://github.com/logstash-plugins/logstash-filter-grok/pull/125/commits/fcf77705d32a69a6a6ffeefee720737bd7c6b128 :)

original-brownbear commented 6 years ago

@andrewvc Thanks! Same test failures happen in master as well. Will look into them later today (it's my job this week anyhow :))

elasticsearch-bot commented 6 years ago

Armin Braun merged this into the following branches!

Branch	Commits
master	02121f77ce25fa5ebb53dd3dbc6f2dd539f4d723, dfde844cc0650d4779485776942cfbb19871a4c7, cff80356f05794457474dfc8ffd76f5ed55070c8, ae6e64746d4f7d2441e658f532e9da19d5bde924

IrlJidel commented 6 years ago

@andrewvc Is 3.4.5 published to rubygems.org ?

andrewvc commented 6 years ago

Yes, our internal workflow for doing so hit a snag. @original-brownbear can you look at it when you have a chance?

On Nov 23, 2017 6:33 AM, "IrlJidel" notifications@github.com wrote:

Is 3.4.5 published to rubygems.org ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-grok/pull/125#issuecomment-346633431, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIBYx5ag-q_3c9ueLMcKjZv3xIQpQY4ks5s5YI0gaJpZM4QSdlr .

original-brownbear commented 6 years ago

@andrewvc it looks like this is a job for @jordansissel, not sure what I can do here. He's already been pinged, but I'll keep an eye on this so we won't forget to release this once things are repaired.

andrewvc commented 6 years ago

Published as 3.4.5

jsvd commented 6 years ago

As a correction, this was released as 4.0.0, version 3.4.5 was yanked from rubygems as it broke bwc with logstash 5.5

logstash-plugins / logstash-filter-grok

PERFORMANCE: Fix slow metric invocation and needless locking on timeout enforcer #125

@original-brownbear commented on this pull request.

Ensure that we never attempt to cancel this thread twice in the event

of weird races

Ensure that we never attempt to cancel this thread twice in the event

of weird races