logstash-plugins / logstash-filter-grok

Grok plugin to parse unstructured (log) data into something structured.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Apache License 2.0
122 stars 97 forks source link

Memory leak on 5.x #135

Closed jakelandis closed 6 years ago

jakelandis commented 6 years ago

As described in https://github.com/elastic/logstash/issues/9127 , 5.5.6 introduced a memory leak. This change is culprit: https://github.com/logstash-plugins/logstash-filter-grok/compare/v4.0.0...v4.0.1

While the change itself does not leak memory, it exercises a bug in JRuby 1.7 that leaks memory.

Specifically the following code results in the leak.

@threads_to_start_time.forEach do |thread, start_time|
      # Use compute to lock this value
      @threads_to_start_time.computeIfPresent(thread) do |thread, start_time|
        if start_time < now && now - start_time > @timeout_nanos
          thread.interrupt
          nil # Delete the key
        else
          start_time # preserve the key
        end
      end
    end

I was able to create a reproducible case outside of Logstash. The bug appears to ONLY be with JRuby 1.x (tested jruby-1.7.27), and NOT JRuby 9k (tested 9.1.15) and thus Logstash 6.x is not affected.

The following code will throw OOM in ~ 2-3 minutes.

def mymethod
    test = java.util.concurrent.ConcurrentHashMap.new
        test.put("test", java.util.UUID.randomUUID())

    loop do
        test.forEach do |key, value|
            test.computeIfPresent("test") do |key, value|
                java.util.UUID.randomUUID()
            end
        end
    end
end     

thread = Thread.new do
     mymethod
end

15.times do
    thread = Thread.new do
     mymethod
 end
end

thread.join

With the following memory signature:

image

With trial and error, i discovered that this bug only presents when the outer loop is coded as a lambda block. I suspect the old version of JRuby (1.x) is not handling the lambda in lambda block correctly. Changing the outer from a lambda block to a standard for loop seems to fix the issue.

loop do
        test.each do |entry|                
            test.computeIfPresent(entry[0]) do |key, value| 
                java.util.UUID.randomUUID()         
            end         
        end
    end

This is slow leak, and may take days to for a production deployment to see memory issues. It is compounded by the number of workers (default # cores) since each worker has one of these loops.