taf2 / curb

Ruby bindings for libcurl
Other
1.29k stars 229 forks source link

CLOSE_WAIT, too many open files #277

Closed heaven closed 5 years ago

heaven commented 8 years ago

This code:

easy = Curl::Easy.new(site.rss_url)
easy.ssl_verify_peer = false
easy.follow_location = true
easy.max_redirects = 5
easy.connect_timeout = 15
easy.timeout = 20
easy.perform

leaks connections.

After some time I am getting errors from sidekiq: Error fetching job: Too many open files - getaddrinfo.

$ netstat -an | awk '/tcp/ {print $6}' | sort | uniq -c:

   4024 CLOSE_WAIT
    163 ESTABLISHED
     12 LISTEN
     85 TIME_WAIT

$ lsof -p 6245 | grep CLOSE_WAIT | wc -l

4020

A part of lsof output:

ruby    6245 deploy 4085u  IPv4 445657907         0t0       TCP ...:38806->theautismeducationsite.com:http (CLOSE_WAIT)
ruby    6245 deploy 4086u  IPv4 445657898         0t0       TCP ...:42963->qm-in-f132.1e100.net:http (CLOSE_WAIT)
ruby    6245 deploy 4087u  IPv4 445657912         0t0       TCP ...:45454->qm-in-f118.1e100.net:http (CLOSE_WAIT)
ruby    6245 deploy 4088u  IPv4 445657915         0t0       TCP ...:49490->192.0.78.13:https (CLOSE_WAIT)
ruby    6245 deploy 4089u  IPv4 445661545         0t0       TCP ...:52097->qm-in-f121.1e100.net:http (CLOSE_WAIT)
ruby    6245 deploy 4090u  IPv4 445657929         0t0       TCP ...:42984->qm-in-f132.1e100.net:http (CLOSE_WAIT)
ruby    6245 deploy 4091u  IPv4 445658814         0t0       TCP ...:42973->qm-in-f132.1e100.net:http (CLOSE_WAIT)
ruby    6245 deploy 4092u  IPv4 445648740         0t0       TCP ...:60335->ps478203.dreamhost.com:http (CLOSE_WAIT)
ruby    6245 deploy 4093u  IPv4 445659562         0t0       TCP ...:42985->qm-in-f132.1e100.net:http (CLOSE_WAIT)
ruby    6245 deploy 4094u  IPv4 445659564         0t0       TCP ...:43010->qm-in-f132.1e100.net:http (CLOSE_WAIT)
ruby    6245 deploy 4095u  IPv4 445648749         0t0       TCP ...:43009->qm-in-f132.1e100.net:http (CLOSE_WAIT)
taf2 commented 8 years ago

after easy .perform if you run easy.close do you still see the open file issue? this should run with GC but if you're not having any GC runs then it would not close those connections... perhaps?

heaven commented 8 years ago

I do have GC, I even do run it manually at some points. Will try easy.close suggestion and let you know.

nubis commented 8 years ago

I think i just ran into this issue as well. (Using Curl::Easy) I'm not calling 'perform', I use 'http(:get)' instead.

Initially just running my script would immediately start piling up sockets in the ESTAB and CLOSE-WAIT state (as reported by the 'ss' utility in my ubuntu box)

After adding an explicit call to 'curl.close' after being done with my curl object I got my script to run in constant file descriptor space. But this just lasted a few minutes, then it started piling up sockets in the CLOSE-WAIT state again.

Finally, I added a 'GC.start' call after calling 'close' on my curl object and it seems to have fixed it for good.

It was an ugly fix though, I just want to keep using curl/curb in my library but I'm afraid switching may be easier for me than fixing this all by myself (I have never ever read curl code, or native ruby extensions for the matter). I can certainly help diagnosing this, I know I can reliably reproduce the issue by just removing those fixes.

Cheers!

taf2 commented 8 years ago

Well, it's good to hear that close + GC.start works around the issue. It makes sense, because the socket is only closed in a GC cycle. I think there is also a keepalive setting in curl we can set to resolve this, which would be less "ugly" to your point. Doing a little google search however, it looks like the issue is related to this bug report: http://sourceforge.net/p/curl/bugs/1010/

thomax commented 8 years ago

I use Curl::Easy for large amounts of traffic, and am also getting a pile-up sockets in the CLOSE_WAIT state. Solving this by closing, and opening a new, connection after every get would be costly. Is there any other solution on the horizon, @taf2 ?

taf2 commented 8 years ago

It looks like this might related to a bad version of libcurl? Did you check the bug report or also see this: https://curl.haxx.se/mail/lib-2009-03/0009.html

thomax commented 8 years ago

Thanks for the tip @taf2 ! I'll look into that!

stayhero commented 8 years ago

I'm running into this as well since we basically upgraded to Ruby 2.3.0 and curb from 0.8.5 to 0.9.1.

We use Ubuntu LTS 14.04 with libcurl 7.35 (latest Ubuntu 14.04 packaged version). Is the suggested solution to compile libcurl manually on latest Ubuntu LTS? Or is it possible a Ruby downgrade would help (because probably Ruby 2.3 has changed their GC behavior etc)? To be honest I'm a bit hesitant to install libcurl manually instead of using the default Ubuntu apt-repos. ;)

stayhero commented 8 years ago

BTW: the issue described here https://curl.haxx.se/mail/lib-2009-03/0009.html should have been fixed in libcurl 7.20 at least? Hence I guess this is not the problem here?

stayhero commented 8 years ago

@thomax @heaven @nubis Did you run into it in Ruby versions < 2.3?

thomax commented 8 years ago

@stayhero Yeah, I'm having this problem on Ruby v2.2.3p173 and I'm sure our libcurl is patched.

thomax commented 8 years ago

Could this be a problem with ruby garbage collection, rather than curb/curl?

nubis commented 8 years ago

Ruby 2.2.1p85 when this happened.

On Mon, Mar 14, 2016 at 4:24 AM, Christian notifications@github.com wrote:

@thomax https://github.com/thomax @heaven https://github.com/heaven @nubis https://github.com/nubis Did you run into it in Ruby versions < 2.3?

— Reply to this email directly or view it on GitHub https://github.com/taf2/curb/issues/277#issuecomment-196178880.

thomax commented 8 years ago

We've solved this problem in our codebase. We used to just grab a new Curb::Easy instance for each request, resulting in the aforementioned issue. Our workaround was to always reuse and reset the Curl::Easy instance, something along these lines:

def self.get(url = nil, params = nil, &block)
  url, params = url_and_params_from_args(url, params, &block)
  return with_curl do |easy|
    easy.url = url_with_params(url, params)
    easy.http_get
  end
end

def self.with_curl(&block)
  easy = Thread.current[:pebblebed_curb_easy] ||= Curl::Easy.new
  easy.reset
  yield easy
  return handle_http_errors(Response.new(easy))
end

def self.handle_http_errors(response)
  if response.status == 404
    errmsg = "Resource not found: '#{response.url}'"
    errmsg << extract_error_summary(response.body)
    raise HttpNotFoundError.new(ActiveSupport::SafeBuffer.new(errmsg), response.status)
  elsif response.status >= 400
    errmsg = "Service request to '#{response.url}' failed (#{response.status}):"
    errmsg << extract_error_summary(response.body)
    raise HttpError.new(ActiveSupport::SafeBuffer.new(errmsg), response.status, response)
  end
  response
end
taf2 commented 8 years ago

@thomax you should look at using the following methods as they internally use Thread.current to reuse a curl handle:

Curl.get Curl.post

thomax commented 8 years ago

@taf2 Thanks for the suggestion, I'll look into that!

Would it be unexpected behaviour if Curl::Easy just wrapped Curl.method and quietly spared me the trouble of worrying about the pileup of unusable sockets?

taf2 commented 8 years ago

Possibly Curl::Easy allows you to get a new libcurl easy handle. I'm not exactly sure what is causing your issue... Do you have sample code to recreate the issue?

mikaelhm commented 8 years ago

I also have this issue with code like this:

def post_request(url, request_body)
  request = Curl::Easy.new(url)
  request.headers['Content-type'] = 'application/json'
  request.certtype = 'PEM'
  request.cert = "<certificate-string>"
  request.cacert = "<root-cert-string>"
  request.cert_key = "<private-key-string>"
  request.http_post(request_body)

  return request.body_str
end

I tried @nubis suggestion with GC.start and it works:

def post_request(url, request_body)
  request = Curl::Easy.new(url)
  request.headers['Content-type'] = 'application/json'
  request.certtype = 'PEM'
  request.cert = "<certificate-string>"
  request.cacert = "<root-cert-string>"
  request.cert_key = "<private-key-string>"
  request.http_post(request_body)
  response_body = request.body_str
  request.close
  GC.start

  return response_body
end

System:

levilansing commented 8 years ago

+1 I also have this problem using Curb inside a thin service. Even if I use Curl.post/get AND explicitly close the connections AND run GC.start occasionally, that only helps reduce the problem, it seems to still occur eventually.

Symptoms of failure are weird. Either a "Couldn't resolve host name (Curl::Err::HostResolutionError)" or sometimes it causes a Segmentation fault.

taf2 commented 8 years ago

@mikaelhm the GC.start is probably not necessary only request.close - to ensure curb tells libcurl to close the handle, which would eventually be closed by GC anyway. GC will after a call to close just immediately run and free up your existing ruby objects which might be good or might not. You can for example in unicorn put that GC.start outside the request which might be better...

@levilansing mixing Curl.post/Curl.get with close, may not be what you want? Curl.post/Curl.get will put a single Curl::Easy handle into Thread.current - meaning the handle is shared/re-used between requests. The main benefit of using it is to share the existing connect.

Thinking about this issue, I suspect we might need to do something internal to force the handles to be freed or I wonder if the handle's are piling up in multi handle. It might be interesting to get the value of handle.mult.idle?

I've made a commit here: 2509064..e3126a8

That clears out the multi request handle early. I think this might be the solution, please let me know.

mikaelhm commented 8 years ago

@taf2 I tried with just request.close before reading about GC.start-workaround. In my case the problem wasn't fixed until i introduced the GC.start-workaround.

I will try to remove it again and upgrade curb to 0.9.2 and see if your attempt fixed it.

taf2 commented 8 years ago

@mikaelhm I bet the GC.start is triggering the multi handle to close that could be the leak. If so 0.9.2 will fix it unless of course there are other leaks :(

mikaelhm commented 8 years ago

I will let you know. Thank for attempting to fix the leak

mikaelhm commented 8 years ago

@taf2 i cannot test the fix in 0.9.2 https://github.com/taf2/curb/issues/296

mikaelhm commented 8 years ago

After a days use, I feel confident that @taf2 fixed the memory/socket leak.

0.9.3 fixed my issues.

stayhero commented 8 years ago

0.9.3. solved our issues as well.

ta commented 7 years ago

We ran into this issue today as well with curb 0.9.4. The GC.start workaround seems to have fixed the issue.

Environment:

$ uname -srv
Linux 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017

$ dpkg -l | grep libcurl4-openssl-dev
ii  libcurl4-openssl-dev:amd64           7.35.0-1ubuntu2.10

$ ruby -v
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux-gnu]

$ bundle list | grep curb
  * curb (0.9.4)

Code to reproduce:

100.times { curl = Curl::Easy.new("http://google.com"); curl.get }

After a few minutes: $ netstat -an | grep "CLOSE_WAIT" | wc -l (yielded +100 in our tests before workaround and yielded 2 after workaround)

pts-owentran commented 7 years ago

Hi. For the past two weeks, I've been chasing down growing memory on my Rails Puma server randomly where file descriptors are used with connections in CLOSE_WAIT from curb requests after upgrading from 0.8.8 to 0.9.4. The GC.stat has a growing number of heap_final_slots indicating objects waiting to be finalized but never getting GC'ed. One this state is triggered, the heap grows unbounded until the worker crashes.

[28864] [2017-09-27 05:58:37 +0000] {:count=>108, :heap_allocated_pages=>6811, :heap_sorted_length=>7014, :heap_allocatable_pages=>0, :heap_available_slots=>2776137, :heap_live_slots=>2035764, :heap_free_slots=>740373, :heap_final_slots=>0, :heap_marked_slots=>788219, :heap_swept_slots=>141653, :heap_eden_pages=>4995, :heap_tomb_pages=>1816, :total_allocated_pages=>6811, :total_freed_pages=>0, :total_allocated_objects=>61074009, :total_freed_objects=>59038245, :malloc_increase_bytes=>24043216, :malloc_increase_bytes_limit=>67108864, :minor_gc_count=>76, :major_gc_count=>32, :remembered_wb_unprotected_objects=>35460, :remembered_wb_unprotected_objects_limit=>41569, :old_objects=>720516, :old_objects_limit=>850914, :oldmalloc_increase_bytes=>88012080, :oldmalloc_increase_bytes_limit=>131586007}
[28864] [2017-09-27 05:59:37 +0000] {:count=>108, :heap_allocated_pages=>6811, :heap_sorted_length=>7014, :heap_allocatable_pages=>0, :heap_available_slots=>2776137, :heap_live_slots=>2026272, :heap_free_slots=>749865, :heap_final_slots=>0, :heap_marked_slots=>788219, :heap_swept_slots=>311425, :heap_eden_pages=>4973, :heap_tomb_pages=>1838, :total_allocated_pages=>6811, :total_freed_pages=>0, :total_allocated_objects=>61234292, :total_freed_objects=>59208020, :malloc_increase_bytes=>25442760, :malloc_increase_bytes_limit=>67108864, :minor_gc_count=>76, :major_gc_count=>32, :remembered_wb_unprotected_objects=>35460, :remembered_wb_unprotected_objects_limit=>41569, :old_objects=>720516, :old_objects_limit=>850914, :oldmalloc_increase_bytes=>89411624, :oldmalloc_increase_bytes_limit=>131586007}
[28864] [2017-09-27 06:00:37 +0000] {:count=>108, :heap_allocated_pages=>6811, :heap_sorted_length=>7014, :heap_allocatable_pages=>0,   :heap_available_slots=>2776137, :heap_live_slots=>2026395, :heap_free_slots=>749742,  :heap_final_slots=>0,    :heap_marked_slots=>788219, :heap_swept_slots=>517073, :heap_eden_pages=>4973, :heap_tomb_pages=>1838, :total_allocated_pages=>6811, :total_freed_pages=>0, :total_allocated_objects=>61440066, :total_freed_objects=>59413671, :malloc_increase_bytes=>46605192, :malloc_increase_bytes_limit=>67108864, :minor_gc_count=>76, :major_gc_count=>32, :remembered_wb_unprotected_objects=>35460, :remembered_wb_unprotected_objects_limit=>41569, :old_objects=>720516, :old_objects_limit=>850914, :oldmalloc_increase_bytes=>110574040, :oldmalloc_increase_bytes_limit=>131586007}
[28864] [2017-09-27 06:01:37 +0000] {:count=>109, :heap_allocated_pages=>6811, :heap_sorted_length=>7014, :heap_allocatable_pages=>142, :heap_available_slots=>2776137, :heap_live_slots=>1714003, :heap_free_slots=>1060659, :heap_final_slots=>1475, :heap_marked_slots=>789481, :heap_swept_slots=>67690,  :heap_eden_pages=>4949, :heap_tomb_pages=>1862, :total_allocated_pages=>6811, :total_freed_pages=>0, :total_allocated_objects=>61894234, :total_freed_objects=>60178756, :malloc_increase_bytes=>48688,    :malloc_increase_bytes_limit=>67108864, :minor_gc_count=>77, :major_gc_count=>32, :remembered_wb_unprotected_objects=>35587, :remembered_wb_unprotected_objects_limit=>41569, :old_objects=>730902, :old_objects_limit=>850914, :oldmalloc_increase_bytes=>130802344, :oldmalloc_increase_bytes_limit=>131586007}
[28864] [2017-09-27 06:02:37 +0000] {:count=>109, :heap_allocated_pages=>6811, :heap_sorted_length=>7014, :heap_allocatable_pages=>0, :heap_available_slots=>2776137, :heap_live_slots=>1702113, :heap_free_slots=>1071494, :heap_final_slots=>2530, :heap_marked_slots=>789481, :heap_swept_slots=>389097, :heap_eden_pages=>4922, :heap_tomb_pages=>1889, :total_allocated_pages=>6811, :total_freed_pages=>0, :total_allocated_objects=>62204483, :total_freed_objects=>60499840, :malloc_increase_bytes=>25402264, :malloc_increase_bytes_limit=>67108864, :minor_gc_count=>77, :major_gc_count=>32, :remembered_wb_unprotected_objects=>35587, :remembered_wb_unprotected_objects_limit=>41569, :old_objects=>730902, :old_objects_limit=>850914, :oldmalloc_increase_bytes=>131649888, :oldmalloc_increase_bytes_limit=>131586007}
[28864] [2017-09-27 06:03:37 +0000] {:count=>109, :heap_allocated_pages=>6811, :heap_sorted_length=>7014, :heap_allocatable_pages=>0, :heap_available_slots=>2776137, :heap_live_slots=>1701008, :heap_free_slots=>1071013, :heap_final_slots=>4116, :heap_marked_slots=>789481, :heap_swept_slots=>824320, :heap_eden_pages=>4921, :heap_tomb_pages=>1890, :total_allocated_pages=>6811, :total_freed_pages=>0, :total_allocated_objects=>62639288, :total_freed_objects=>60934164, :malloc_increase_bytes=>10043088, :malloc_increase_bytes_limit=>67108864, :minor_gc_count=>77, :major_gc_count=>32, :remembered_wb_unprotected_objects=>35587, :remembered_wb_unprotected_objects_limit=>41569, :old_objects=>730902, :old_objects_limit=>850914, :oldmalloc_increase_bytes=>116290712, :oldmalloc_increase_bytes_limit=>131586007}
screen shot 2017-09-27 at 10 05 55 pm

I've bought myself some time with the puma-killer gem but want to solve the underlying issue. Calling close in an ensure block seems to do nothing. Using GC.start seems to clear any connections but GC time per request can be anywhere from 100-200 ms according to my NewRelic monitoring.

newrelic

Attempting to rollback to version 0.8.8 and see if the issue still manifests.

pts-owentran commented 7 years ago

@ta I was unable to reproduce any CLOSE_WAIT with version 0.9.4 on my vagrant box with almost the same environment as you:

$ uname -srv
Linux 3.13.0-128-generic #177-Ubuntu SMP Tue Aug 8 11:40:23 UTC 2017

$ dpkg -l | grep libcurl4-openssl-dev
ii  libcurl4-openssl-dev:amd64           7.35.0-1ubuntu2.10                         amd64        

$ ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]

$ bundle list | grep curb
  * curb (0.9.4)

I did observe, that running the code block 100 times creates about 100 ESTABLISHED connections that won't go away until GC is called regardless if you call close. With GC.start added, connections are cleared (verified by doing lsof | grep ESTABLISHED | wc -l).

So GC.start will fix the issue at the cost of much higher response time. I will report back on how our production site reacts. It normally takes 4-6 hours under moderate traffic to start seeing the CLOSE_WAIT to manifest.

ta commented 7 years ago

@pts-owentran The problem only surfaced in our environment after upgrading from ruby 2.2.x to 2.4.x (we skipped 2.3.x).

pts-owentran commented 7 years ago

Confirmed that 0.8.8 works fine after 24 hours. Heap is stable and no CLOSE_WAIT connections leaking. I think the memory leak from 0.9.3 is still there and not fixed in 0.9.4. Please let me know how I can help, we will stay on version 0.8.8 for now.

screen shot 2017-09-28 at 9 54 36 pm
taf2 commented 5 years ago

@pts-owentran thank you for this and sorry for the long delay. Doing more tests on this and wondering if you can try the latest master?

pts-owentran commented 5 years ago

Hi, thanks for checking. I was looking through the change log and didn't see anything post 0.9.4 to the current 0.9.8 that would have resolved the issue. Any reason why you think it would work in master? We could only reproduce the issue in production and had no way to reliably create the issue other than lots of traffic.

morhekil commented 5 years ago

we've been hit by what looks like the same bug recently, too, after an upgrade of curb from 0.9.4 to 0.9.7. Curb is the only thing that changed, and the app is running on ruby 2.5.1p57 at the moment.

After the upgrade we started seeing our workers occasionally opening hundreds of connections when under load, which leads to them exhausting system ulimits fairly fast.

Downgrading back to 0.9.4 fixes the issue, to it looks like something that has changed between these two versions causes this issue to manifest.

pts-owentran commented 5 years ago

Hi @morhekil, I saw the issue with 0.9.4 so I'm not sure if it's the same issue. We're still on 0.8.8 and can't really move forward, since it's pretty tough to pinpoint the issue (repro case only in production under load). I think there may be a bug somewhere between 0.8.8 and 0.9.4 that is even more apparent with 0.9.7. Are you able to reproduce the issue under repeatable load test?

morhekil commented 5 years ago

@pts-owentran not yet, we just pinpointed curb to be the source of the problem, and investigating it further

taf2 commented 5 years ago

@morhekil let me know if you can pinpoint the issue. I'm sure this is related to my attempts at fixing GC issues.

robuye commented 5 years ago

I think I can reproduce it locally. I used an example code from readme and just executed it in irb:

require 'curb'

def run!
  100.times do
    responses = {}
    requests = ["http://www.google.co.uk/", "http://www.ruby-lang.org/"]
    m = Curl::Multi.new
    # add a few easy handles
    requests.each do |url|
      responses[url] = ""
      c = Curl::Easy.new(url) do|curl|
        curl.follow_location = true
        curl.on_body{|data| responses[url] << data; data.size }
        curl.on_success {|easy| puts "success, add more easy handles" }
      end
      m.add(c)
    end

    m.perform do
      puts "idling... can do some work here"
    end
  end
  nil
end

run!

it leaves 200 connections:

netstat -anp |grep irb |head
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.0.102:55282     216.58.215.67:80        ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:36700     151.101.1.178:80        ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:50748     151.101.65.178:80       ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:55250     216.58.215.67:80        ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:51412     151.101.193.178:80      ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:55266     216.58.215.67:80        ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:60046     151.101.129.178:80      ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:36764     151.101.1.178:80        ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:36732     151.101.1.178:80        ESTABLISHED 10960/irb       
tcp        0      0 192.168.0.102:55034     216.58.215.67:80        ESTABLISHED 10960/irb       

netstat -anp |grep irb |wc -l
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
200

I tested on curb 0.9.7 and 0.9.4 and observed identical behavior. Tested on ruby 2.3.3p222. The connections are closed when I exit irb, otherwise the number grows by 200 on each run! (so +1 for each request).

0.9.3 works as expected, I'm seeing 2 connections at most and everything cleaned up by the time run! returns.

I will try to find a commit that introduces this regression.

morhekil commented 5 years ago

@pts-owentran @robuye yep, my apologies - I just double-checked the history, and we did update from 0.9.3, not 0.9.4. So it looks like it's consistent here - 0.9.3 was working fine, later versions are leaking connections

robuye commented 5 years ago

I did git bisect between 3be35e9ffbd00343633108c88f3cc3379f073a43 (0.9.3, good) and a0df1c774d86802dadc79f0b41472164fd1c29c2 (0.9.4, bad):

504f1a7e78a0a3c0dd26ffd419b6a817c263547c is the first bad commit
commit 504f1a7e78a0a3c0dd26ffd419b6a817c263547c
Author: Todd Fisher <todd.fisher@gmail.com>
Date:   Sat Aug 26 21:25:28 2017 -0400

    Revert "experimenting with adding multi handle cleanup after every request, normally would be cleaned up with GC"

    This reverts commit e3126a874ccfae8adf4cd845b2d86f0f2aeda932.

    I believe the maxfd fix is the correct solution to improve stability not this patch

:040000 040000 c435026c1d142cefb02f9063e9c0c7d75d340f9e 0159137502fb60289ace44d05623bc6f17702e27 M      ext

b3d265b seems to be the last good commit. 504f1a7 reverts e3126a8 and triggers the regression. There was a range of other commits in between so something could have been added there too, may be worth looking at. Here's the list:

504f1a7  Revert "experimenting with adding multi handle cleanup after every request, normally would be cleaned up with GC"
b3d265b  Fixes #323 - we should start to handle maxfd == -1 case
b88d649  reduce warnings when running test with ruby 2.4.1
da1cc52  Merge pull request #318 from roscopeco/master
fd586cc  Remove ruby warnings
1b46929  A few more test fixes for Windows (#317)
7aa1c89  Merge branch 'master' of https://github.com/taf2/curb
f3e2797  Fix EOL handling for Windows compatibility
174ae8e  (Hopefully) fix tests on Windows across the board, including JRuby etc.
aea174b  Re-fix tests on Windows
5be39a9  Update homepage (#311)
ee2834a  Make tests work on Windows (#316)
6ab360d  Make tests work on Windows
fd02907  Add cookielist method to Curb::Easy (#308)
7799f9a  add support for CURLOPT_PATH_AS_IS (#286)
9573ed5  Revert "Add support for http PATCH (#301)"
0178101  Add support for http PATCH (#301)
32ca39a  Add a require statement to help with new users who are getting started with curb (#300)
e6b9903  Fix multi.pipeline: HAVE_CURLMOPT_PIPELINING was never defined (#298)
3be35e9  gem version update
3ed0f62  correctly check for constants not available in older version of libcurl
c6f0a25  update for release 0.9.2
8de4e4c  Add travis-ci build status label (#282)
8f60aa9  Support empty reason phrases in the HTTP status line (#292)
4234d4a  Use LONG2NUM and NUM2LONG to support all values of the 'long' type (#295)
589b282  Fix http_auth_types for :any and :anysafe (#291)
e3126a8  experimenting with adding multi handle cleanup after every request, normally would be cleaned up with GC

See e3126a8 has a comment on Github, it apparently introduced a regression and was reverted as a result of this. This is very interesting because before the connections would be always closed and couldn't be reused. Now it seems the connections are not closed but they are not reused either.

When I was doing my tests I was looking only at the number of connections, not how old they are, I didn't look at this.

That's as far as I got for now.

taf2 commented 5 years ago

Thanks @robuye !

taf2 commented 5 years ago

Okay, I need to test this more but maybe adding:

diff --git a/ext/curb_multi.c b/ext/curb_multi.c
index 705e522..aa588a9 100644
--- a/ext/curb_multi.c
+++ b/ext/curb_multi.c
@@ -276,6 +276,9 @@ static void rb_curl_mutli_handle_complete(VALUE self, CURL *easy_handle, int res

   curl_easy_getinfo(rbce->curl, CURLINFO_RESPONSE_CODE, &response_code);

+  // request is finished remove easy handle from multi handle to allow connections to be freed
+  rb_funcall(self, rb_intern("remove"), 1, easy);
+
   if (result != 0) {
     if (!rb_easy_nil("failure_proc")) {
       callargs = rb_ary_new3(3, rb_easy_get("failure_proc"), easy, rb_curl_easy_error(result));
taf2 commented 5 years ago

argh we already do that... so never mind... also running the example test provided... and running netstat i'm not seeing any connection leaks... maybe i'm not testing this correclty?

while running the test script:

taf2@ip-10-55-11-11:~/work/curb$ netstat -anp |grep ruby |head
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 10.55.11.11:54086           151.101.249.178:80          ESTABLISHED 18298/ruby
tcp        0      0 10.55.11.11:38648           172.217.8.3:80              ESTABLISHED 18298/ruby
taf2 commented 5 years ago

What version of libcurl are you guys testing on? I ran my test on ruby-2.5.3 and libcurl 7.53.1

robuye commented 5 years ago

I didn't have time to sit on it since my last comment.

Here's the configuration I was using:

root@f764d8e598a6:/code# ruby -v
ruby 2.3.8p459 (2018-10-18 revision 65136) [x86_64-linux]
root@f764d8e598a6:/code# curl -V
curl 7.52.1 (x86_64-pc-linux-gnu) libcurl/7.52.1 OpenSSL/1.0.2q zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) libssh2/1.7.0 nghttp2/1.18.1 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL 

This is from docker, and I also tested it locally in the following configuration:

$ ruby -v
ruby 2.3.3p222 (2016-11-21 revision 56859) [x86_64-linux]

$ curl -V
curl 7.61.1-DEV (x86_64-pc-linux-gnu) libcurl/7.61.1-DEV OpenSSL/1.0.2g zlib/1.2.8
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP UnixSockets HTTPS-proxy 

Something to bear in mind, you need to recompile and reload irb when you checkout different commit for testing.

@taf2 I mention I used an example from the readme, but I also added a loop there to run it 100 times. Not sure you noticed, but without it 2 FDs are expected.

taf2 commented 5 years ago

Yup, I increased the loop to 1000 and monitored FDs open and it never exceeded 2 for me. I used ruby-2.5.3 for testing... and libcurl 7.53.1 and was not able to leak any FDs...

robuye commented 5 years ago

alright, let me check it out on ruby 2.5, I will be back shortly.

morhekil commented 5 years ago

for us the server where we see the leaks runs ruby 2.5.1 and libcurl 7.22.0

robuye commented 5 years ago

It's the same on ruby 2.5.3p105 and curl 7.52.1.

@taf2 are you on osx perhaps?

taf2 commented 5 years ago

@robuye ec2 aws amazon linux - so a flavor of centos...