logstash-plugins / logstash-filter-grok

Grok plugin to parse unstructured (log) data into something structured.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Apache License 2.0
124 stars 97 forks source link

PERFORMANCE: Fix slow metric invocation and needless locking on timeout enforcer #125

Closed original-brownbear closed 6 years ago

original-brownbear commented 6 years ago

This is a more than a 100% speedup for me relative to master when testing against the test configuration posted here https://github.com/elastic/logstash/issues/8560#issue-270112018.

I know Travis will probably go red on this one, but it's not broken functionally :) see below:

jakelandis commented 6 years ago

I think you can also get some (i think) easy performance boost by using a thread pool here (or executor service) instead of creating a new thread every time.

original-brownbear commented 6 years ago

@jakelandis but I'm only creating that thread once when starting Grok? I'm not creating any new threads periodically as far as I can see.

jakelandis commented 6 years ago

Ahh nevermind, i was mis-understanding how this worked. (also why I shouldn't do the review for this change :) )

original-brownbear commented 6 years ago

@andrewvc how about this one https://github.com/logstash-plugins/logstash-filter-grok/pull/125/commits/1601d93a5493af4928193ed61ca1d5c0f315d62c ?

This actually seems to be sufficient to make things safe (enough) without locking in non-timeout operation:

Let me know what you think, this change seems to actually make things faster (in theory, benchmarks didn't change for me) since we save some redundant java.lang.Thread.interrupted and accidental thread.interrupt() calls.

Let me know what you think :)

jordansissel commented 6 years ago

@original-brownbear not to derail, but I'm open to having a timeout feature in the Grok library itself if you think that is more appropriate than having Logstash wrap Grok with this interruption feature.

original-brownbear commented 6 years ago

@jordansissel hmm I don't know enough about the Grok build to answer that I fear :( but :) ...

assuming we can get some Java code compiled in the Grok build we could probably build a somewhat faster and much, much cleaner solution like by simply wrapping the joni regex library on the Java end so that we get:

  def execute(text)
    @regexp.match(text, timeout)
  end

which has the handling of the enforcer thread and interrupting all in Java (should be a visible gain memory wise to save at least on Ruby thread + we can implement some of that interrupting logic a little faster in Java potentially). More importantly it would make what @andrewvc suggested (making the timeout logic not run if configured with timeout 0) much cleaner as well (imo). => I'm all for it, not sure we want to do that in this iteration though, this seemed urgent?

jordansissel commented 6 years ago

we could probably build a somewhat faster and much, much cleaner solution like by simply wrapping the joni regex library on the Java end

Yep!

I'm all for it, not sure we want to do that in this iteration though

Agreed. We don't necessarily need to do this now.

original-brownbear commented 6 years ago

Build failure is a preexisting condition btw: https://travis-ci.org/logstash-plugins/logstash-filter-grok/jobs/296953878#L918

marioplumbarius commented 6 years ago

I rerun the benchmarks for this new patch: https://github.com/elastic/logstash/issues/8560#issuecomment-344582851

andrewvc commented 6 years ago

I'm ok with this if we add an explanatory comment.

On Nov 22, 2017 6:12 AM, "Armin Braun" notifications@github.com wrote:

@original-brownbear commented on this pull request.

In lib/logstash/filters/grok/timeout_enforcer.rb https://github.com/logstash-plugins/logstash-filter-grok/pull/125#discussion_r152545854 :

  • now = java.lang.System.nanoTime # save ourselves some nanotime calls
  • elapsed = java.lang.System.nanoTime - start_time
  • if elapsed > @timeout_nanos
  • elapsed_millis = elapsed / 1000
  • thread.interrupt()
  • Ensure that we never attempt to cancel this thread twice in the event

  • of weird races

  • stop_thread_groking(thread)
  • now = java.lang.System.nanoTime # save ourselves some nanotime calls
  • @threads_to_start_time.entry_set.each do |entry|
  • start_time = entry.get_value
  • if start_time < now && now - start_time > @timeout_nanos
  • thread = entry.get_key
  • Ensure that we never attempt to cancel this thread twice in the event

  • of weird races

  • if stop_thread_groking(thread)

@andrewvc https://github.com/andrewvc you're not, but I think this is a risk we can cleanly take, because with the latest changes we will never interrupt a thread that isn't grokking (sorry for failing to comment on this more explicitly). Think about the orders of magnitude here:

  • The cancel thread will do an iteration over a very small number of entries, some numeric comparisons and (if it indeed does have to interrupt some stuff) a volatile write.

In order to kill the wrong grok action, the the filter + output + poll + from queue + start new grokking action in the worker would have to be faster than a numeric comparison + ConcurrentHashMap#delete here. I think that's close to theoretically impossible and practically impossible. I can see how you could make an argument for this happening by virtue of some freak scheduling decision in the kernel, but practically it's impossible imo. Even if there is a theoretical chance (and this is easily less than 1:1M imo looking at the number of cycles the worker has to take for this to happen vs. the number of cycles the enforcer thread uses up), the effect is at worst wrongfully cancelling a grok action.

I can add a sequence number to this to make it 100% safe (that was my initial plan I explained above), but imo it wasn't worth it performance (small hit, but a hit) + complexity wise ( another 10 lines).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-grok/pull/125#discussion_r152545854, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIBYy73bZOIcuHgeOLSVZFlLLPIRvLbks5s5A-hgaJpZM4QSdlr .

original-brownbear commented 6 years ago

@andrewvc added some documentation on it here https://github.com/logstash-plugins/logstash-filter-grok/pull/125/commits/fcf77705d32a69a6a6ffeefee720737bd7c6b128 :)

original-brownbear commented 6 years ago

@andrewvc Thanks! Same test failures happen in master as well. Will look into them later today (it's my job this week anyhow :))

elasticsearch-bot commented 6 years ago

Armin Braun merged this into the following branches!

Branch Commits
master 02121f77ce25fa5ebb53dd3dbc6f2dd539f4d723, dfde844cc0650d4779485776942cfbb19871a4c7, cff80356f05794457474dfc8ffd76f5ed55070c8, ae6e64746d4f7d2441e658f532e9da19d5bde924
IrlJidel commented 6 years ago

@andrewvc Is 3.4.5 published to rubygems.org ?

andrewvc commented 6 years ago

Yes, our internal workflow for doing so hit a snag. @original-brownbear can you look at it when you have a chance?

On Nov 23, 2017 6:33 AM, "IrlJidel" notifications@github.com wrote:

Is 3.4.5 published to rubygems.org ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-grok/pull/125#issuecomment-346633431, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIBYx5ag-q_3c9ueLMcKjZv3xIQpQY4ks5s5YI0gaJpZM4QSdlr .

original-brownbear commented 6 years ago

@andrewvc it looks like this is a job for @jordansissel, not sure what I can do here. He's already been pinged, but I'll keep an eye on this so we won't forget to release this once things are repaired.

andrewvc commented 6 years ago

Published as 3.4.5

jsvd commented 6 years ago

As a correction, this was released as 4.0.0, version 3.4.5 was yanked from rubygems as it broke bwc with logstash 5.5