typhoeus / ethon

Very simple libcurl wrapper.
MIT License
137 stars 139 forks source link

Memory leak in Ethon::Easy #198

Open itsrainy opened 3 years ago

itsrainy commented 3 years ago

I've tracked down a memory leak in one of our applications that uses Ethon. The simplest reproduction I've been able to find is by running the following (in an irb or pry shell):

require "ethon"
100000.times {
   Ethon::Easy.new
}

This will cause the resident set size of the process to grow and never get cleaned up. This also seems to be consistently reproducible by running the following in the project root (taken from #45):

RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_HEAP_INIT_SLOTS=1 RUBY_HEAP_SLOTS_INCREMENT=10 RUBY_HEAP_SLOTS_GROWTH_FACTOR=.1 ITERATIONS=10000 rspec -I lib -I profile profile/memory_leaks.rb

_Note: commenting out everything in memory_leaks.rb except this specific test makes it much quicker as it will only run the relevant memory leak test._

It seems like the source of the leak is this specific FFI::AutoPointer, which gets created when the Easy initializer calls set_callbacks. The second argument to the AutoPointer constructor is the Curl easy_cleanup method which does seem to get called correctly.

I'm unsure if this is an Ethon, FFI, or libcurl issue, so I figured I would start here. One thing I noticed is that the Easy.cleanup method doesn't ever get called anywhere. Adding a call to cleanup in the calling code seems to fix the leak:

require "ethon"
100000.times {
   e = Ethon::Easy.new
   e.cleanup
}

This fixes the leak (and may be how we patch our applications for the time being). Is that method supposed to be automatically called through some other mechanism or are consumers expected to call it?

I was also under the impression that FFI::AutoPointer handles should automatically get GC'd and shouldn't require a call to free. If that's true, this seems like it might be an FFI issue (unless there's some subtle reference to that handle floating around Ethon somewhere)

This reproduces on Mac OS BigSur 11.03 and on Ubuntu 16.04.6 (ruby 2.7.3, ffi 1.15.3, ethon 0.14.0)

gyfis commented 3 years ago

We're seeing the same memory leak after upgrading ethon 0.12.0 -> 0.14.0 and ffi (1.13.1 -> 1.15.3).

Some minor info: I tried using ethon v0.12.0 with ffi 1.14.0, and I see the same memory leak from the specific test mentioned in the previous comment, so I believe this to be an issue with ffi (or something lower in the stack).

tagliala commented 3 years ago

Hi, is this by any chance related to #194, fixed in #195? Could you try with the master branch and provide feedback?

gyfis commented 3 years ago

@tagliala Hey, thanks for the reply! I tested the memory_leaks.rb init test on the ethon master branch with ffi 1.15.3` and I'm still getting the memory leak, so I don't think that's related.

tagliala commented 3 years ago

@gyfis thanks

Did you get a failing test on 0.14.0 by running

RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_HEAP_INIT_SLOTS=1 RUBY_HEAP_SLOTS_INCREMENT=10 RUBY_HEAP_SLOTS_GROWTH_FACTOR=.1 ITERATIONS=10000 rspec -I lib -I profile profile/memory_leaks.rb

?

init does not fail on my machine (x64 / 16GB / macOS 11.5.1 / ruby 2.7.3 / ffi 1.15.3 / ethon 0.14.0 / curl 7.64.1)

Anyway, I've tried to adapt my standalone test for #194 to check this one

#!/usr/bin/env ruby

# frozen_string_literal: true

ETHON_VERSION = '0.14.0' # { git: 'https://github.com/typhoeus/ethon.git' }

require 'bundler/inline'

begin
  puts "Ruby version: #{RUBY_VERSION}\n\n"

  gemfile(true) do
    source 'https://rubygems.org'

    gem 'ethon', ETHON_VERSION
  end
rescue Gem::LoadError => e
  puts "\nMissing Dependency:\n#{e.backtrace.first} #{e.message}"
rescue LoadError => e
  puts "\nError:\n#{e.backtrace.first} #{e.message}"
  puts DATA.read
  exit 1
end

require 'ethon'

$cleanup_counter = 0

# Monkey patching to provide debugging information
module Ethon
  module Curl
    module_function

    def easy_cleanup(pointer)
      $cleanup_counter += 1
      puts "***** Curl.easy_cleanup called with #{pointer}"
      puts "Cleanup counter: #{$cleanup_counter}"
    end
  end
end

10000.times do
  Ethon::Easy.new
end
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_HEAP_INIT_SLOTS=1 RUBY_HEAP_SLOTS_INCREMENT=10 RUBY_HEAP_SLOTS_GROWTH_FACTOR=.1 ./test.rb 

If I run it against master branch (change ETHON_VERSION to the commented value), cleanup is called the expected number of times (10,000), instead on 0.14.0, wherecleanup` is called a random number of times less than 10,000

So maybe it does not fix the memory leak completely but it should help

jarthod commented 3 years ago

Hi all :wave:

I also encountered this leak in my product, right after upgrading from ethon 0.12.0 to 0.14.0: image

I tried 0.13.0 too and noticed the problem was also present. @tagliala IMO this is related to #194 yes because I noticed various errors related to the lack of available file descriptor in addition to memory problems (e.g. Errno::EMFILE: Too many open files, Resource temporarily unavailable - getaddrinfo). When trying to reproduce manually using Typhoeus with verbose mode I caught this error message:

Couldn't resolve host 'xxxx'
Closing connection 0
Dgetaddrinfo() thread failed to start

Couldn't resolve host 'yyyyy'
Closing connection 0
Dgetaddrinfo() thread failed to start

Couldn't resolve host 'zzzz'
Closing connection 0
Dgetaddrinfo() thread failed to start

The "thread failed to start" error is likely due to FD saturation and it turns into a "Couldn't resolve host" error in libcurl or typhoeus error handling code.

I tried locally with master branch and didn't reproduce the problem (I couldn't confirm about the memory though as I did not manage to reproduce it locally because I hit the FD limits too early) so it looks better in master at least. I'm a bit hesitant to try this in production though.

Edit: I managed to reproduce the leak locally by raising my FD limits. After running my script (which makes 10k Typhoeus requests in a 60-threads pool), with 0.12.0 it uses 1.7GB of RAM at the end, with 0.14.0 it uses :warning: 5.7GB or RAM and with master it uses 1.7GB. So the revert in master looks good to me.

BuonOmo commented 2 years ago

We're having the same problem in my company. I've tested the current master version (b593a9bdc770431ce3357e1c5737700460ff09d8) against 0.14.0 and 0.12.0. Locally, it seems that only the 0.12.0 does not leak. On Heroku however, we keep having a leak. I'll reduce that to a minimal exemple soon fortunately.

EDIT: We solved our issue by having smaller batch size in active record, hence it may very well have been a false lead, sorry for that.

On Heroku (0.14.0 then b593a9bdc then 0.12.0)

image

0.14.0

Retained STRING 15 objects of size 600/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ethon-0.14.0/lib/ethon/easy/callbacks.rb:26
Retained STRING 11 objects of size 440/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ethon-0.14.0/lib/ethon/easy/callbacks.rb:27
Retained OBJECT 11 objects of size 440/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ethon-0.14.0/lib/ethon/easy/callbacks.rb:29
Retained ARRAY 10 objects of size 400/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ethon-0.14.0/lib/ethon/easy/debug_info.rb:23
Retained DATA 4 objects of size 240/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ethon-0.14.0/lib/ethon/easy/header.rb:32
Retained ARRAY 2 objects of size 80/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ffi-1.15.3/lib/ffi/autopointer.rb:96
Retained STRING 1 objects of size 40/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/activerecord-import-1.0.1/lib/activerecord-import/import.rb:579
Retained IMEMO 1 objects of size 40/2280 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/2.7.0/open-uri.rb:16

b593a9bdc770431ce3357e1c5737700460ff09d8

Retained ARRAY 17 objects of size 680/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ffi-1.15.4/lib/ffi/autopointer.rb:96
Retained DATA 6 objects of size 240/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/ethon-b593a9bdc770/lib/ethon/easy/header.rb:28
Retained DATA 5 objects of size 360/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/ethon-b593a9bdc770/lib/ethon/easy/header.rb:32
Retained OBJECT 4 objects of size 160/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ffi-1.15.4/lib/ffi/autopointer.rb:87
Retained ARRAY 3 objects of size 120/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/2.7.0/open-uri.rb:21
Retained HASH 3 objects of size 376/12096 (in bytes) at: /Users/ulysse/Dev/klaxit/klaxit-otp-master/app/services/find_new_itineraries_available_iterator.rb:64
Retained HASH 2 objects of size 80/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/2.7.0/open-uri.rb:21
Retained DATA 2 objects of size 144/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/ethon-b593a9bdc770/lib/ethon/multi/operations.rb:17
Retained STRING 2 objects of size 80/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/ethon-b593a9bdc770/lib/ethon/easy/callbacks.rb:27
Retained IMEMO 1 objects of size 40/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/2.7.0/open-uri.rb:21
Retained IMEMO 1 objects of size 40/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/2.7.0/open-uri.rb:16
Retained STRING 1 objects of size 40/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/ethon-b593a9bdc770/lib/ethon/easy/callbacks.rb:26
Retained ARRAY 1 objects of size 40/12096 (in bytes) at: /Users/ulysse/Dev/klaxit/klaxit-otp-master/app/services/find_new_itineraries_available_iterator.rb:64
Retained OBJECT 1 objects of size 40/12096 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/ethon-b593a9bdc770/lib/ethon/easy/callbacks.rb:29

0.12.0

Retained DATA 3 objects of size 200/400 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ethon-0.12.0/lib/ethon/easy/header.rb:31
Retained DATA 2 objects of size 80/400 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ethon-0.12.0/lib/ethon/easy/header.rb:27
Retained ARRAY 2 objects of size 80/400 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ffi-1.15.4/lib/ffi/autopointer.rb:96
Retained OBJECT 1 objects of size 40/400 (in bytes) at: /Users/ulysse/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/ffi-1.15.4/lib/ffi/autopointer.rb:87
maxvidenin commented 2 years ago

We encountered the same issue recently. Our memory usage graph confirmed that there is a memory leak. The memory usage constantly growth and never goes down on the graph below:

Screenshot 2021-11-12 at 18 01 32

The problem appeared when we updated the gem version from 0.12.0 to 0.14.0.

Downgrade gem version back to 0.12.0 or update to 0.15.0 fixes the issue. See the memory usage graph after updating the version from 0.14.0 to 0.15.0 below:

Screenshot 2021-11-15 at 13 04 11
tagliala commented 2 years ago

So I think that this could be closed as a duplicate of #194, solved in #195, and fixed in 0.15.0

jarthod commented 2 years ago

Yes, maybe 0.13 and 0.14 should be yanked though? to avoid people installing it?

tagliala commented 2 years ago

Yes, maybe 0.13 and 0.14 should be yanked though? to avoid people installing it?

Hi, I was used to yank gems in case of major issues or unwanted breaking changes, but now I've stopped because yanking may break production deploys of third-party applications. For example, you need a urgent hotfix on your application, but the deploy fails because the locked ethon version has been yanked. Fixing ethon is something that you would address, but not when you are fixing your own application

Based on this experience, I would not yank

jarthod commented 2 years ago

Ah yes indeed, the sad world of production deploys re-downloading everything ;) Makes sense, fine for me. I've submitted the report to https://github.com/rubymem/ruby-mem-advisory-db/pull/31 (like bundle audit but for memory leaks) to at least warn the people using this but that's not a lot of people ^^

tagliala commented 2 years ago

Hi, do you think this is still an issue?

The memory leak reported here and confirmed in version 0.13.0 and 0.14.0 should have been fixed in 0.15.0

gyfis commented 2 years ago

Hi, do you think this is still an issue?

The memory leak reported here and confirmed in version 0.13.0 and 0.14.0 should have been fixed in 0.15.0

We’re running ethon 0.15.0 with no more memory leaks present, +1 on not being an issue anymore.

jarthod commented 2 years ago

+1 for me too, no memory problem since 0.15.0