ruby-rdf / sparql-client

SPARQL client for Ruby.
http://rubygems.org/gems/sparql-client
The Unlicense
112 stars 58 forks source link

Unclosed connections with SPARQL::CLIENT #86

Closed jeffreycwitt closed 5 years ago

jeffreycwitt commented 5 years ago

Hi there,

I've been struggling with a problem for a while now in my rails of unclosed connections leading to the app exceeding its limits and returning errors like:

Error during failsafe response: Too many open files @ rb_sysopen - /app/app/views/errors/internal_server_error.html.erb

After making sure every open-uri request is closed and no luck solving the problem I've started investigating deeper.

using lsof I've noticed each page request delivers the following

TCP 192.168.0.22:52703->144.126.4.43:http (CLOSE_WAIT) ruby 8093 jcwitt 50u IPv4 0x47acd29a34c3cb9b 0t0 TCP 192.168.0.22:52800->144.126.4.43:http (CLOSE_WAIT) ruby 8093 jcwitt 51u IPv4 0x47acd29a34bd9b9b 0t0 TCP 192.168.0.22:52803->144.126.4.43:http (CLOSE_WAIT) ruby 8093 jcwitt 52u IPv4 0x47acd29a2bec3f7b 0t0 TCP 192.168.0.22:52804->144.126.4.43:http (CLOSE_WAIT) ruby 8093 jcwitt 53u IPv4 0x47acd29a33acf23b 0t0 TCP 192.168.0.22:52704->144.126.4.43:http (CLOSE_WAIT) ruby 8093 jcwitt 54u IPv4 0x47acd29a2df4723b 0t0 TCP 192.168.0.22:52705->144.126.4.43:http (CLOSE_WAIT)

and as the app gets used the number of (CLOSE_WAIT) files keeps climbing, about 5 per page requests.

Using the IP, I learned that these requests are being made with the following snippet of code

sparql = SPARQL::Client.new(sparqlendpoint) result = sparql.query(query) return result

BTW I'm using sparql-client (~> 2.0).

So I'm wondering if I'm doing something wrong with the way I'm using SPARQL::Client or perhaps I've found a bug and SPARQL::Client should be closing connections and is not.

https://github.com/ruby-rdf/sparql-client/blob/develop/lib/sparql/client.rb#L726-L731

Among other places a http request is being opened at line 729.

Is getting closed anywhere?

I appreciate any help you can provide.

gkellogg commented 5 years ago

The connection is opened in SPARQL::Client#initialize, and is used for queries and updates using the resulting instance. It would normally be garbage collected, and the connection closed when the instance goes out of scope, as with general Ruby object management.

The gem uses Net::HTTP::Persistent, which does support a shutdown method. We should add a test to verify that this is called when the class is garbage collected.

We could also add a close method, which would make this explicit, but would need to clear the reference to Het::HTTP::Persistent, freeze the instance, and catch any further attempt to use it.

I would expect that the connection would be closed when the client instance is deallocated.

jeffreycwitt commented 5 years ago

I would expect this kind of garbage collection as well,

The docs say as much

If you don't call shutdown, that's OK. Ruby will automatically garbage collect and shutdown your HTTP connections when the thread terminates. (https://www.rubydoc.info/github/carlhuda/bundler/Net/HTTP/Persistent)

...which is why I am confused that the CLOSE_WAITS never go away but keep piling up.

But it really appears that these connections are not getting garbage collected. My rails app runs for about 12 hours, but then reliably crashes and return "Too many files open" errors for every subsequent attempt to open a new connection.

As mentioned above running lsof in development that the number CLOSE_WAIT files continues to climb.

Is there anything I can do, at the present, to force close the connection with modifying the sparql-client library directly?

Here's the code where I'm instantiating the SPARQL::CLIENT

sparql = SPARQL::Client.new(sparqlendpoint) https://github.com/lombardpress/lbp.rb/blob/master/lib/lbp/query.rb#L33

I would love to force close the connection with something like sparql.close()

Or is there way to force close all open Net::HTTP::Persistent connections. Basically force garbage collection.

gkellogg commented 5 years ago

@jeffreycwitt can you try the "finish-connection" branch to see if it fixes your problem, it's in PR #87.

jeffreycwitt commented 5 years ago

@gkellogg I'm now using the finish-connection branch now and the problem still seems to be persisting.

I've added a rails profiler.

And you can see here that via the sparql-client / net-persistant libraries 4 http calls are being made per page load to my sparql endpoint

image

Here you can see the output of lsof after 3 calls. There are now 12 open connections, the most page request still has 4 "ESTABLISHED" connection, the previous 8 request have moved from "ESTABLISHED" to "CLOSE_WAIT"

image

After a little more time, and no further requests, all "ESTABLISHED" requests move to "CLOSE_WAIT" but the CLOSE_WAIT connections never (seem) to go away, as you can see below.

image

On production server, this will continue until it reaches the open file limit, which on Heroku is 10,000 open files. Then I have to restart the server.

If you have any other ideas, I would really love to hear them. Thanks again for all the help thus far.

P.s. here are the links to my implementation of the sparql/client code.

First in my ruby gem (lbp.rb) in the Lbp::Query class

https://github.com/lombardpress/lbp.rb/blob/master/lib/lbp/query.rb#L33

The Lbp::Query class is then extended in the the "LombardPress-Web App", but the same Lbp::Query.query() method defined in the above link is used again and again through the LombardPress-Web App, as seen below.

https://github.com/lombardpress/lombardpress-web/blob/master/app/models/misc_query.rb

gkellogg commented 5 years ago

Can yo provide me with. Minimal example that reprocessing the problem? I beeeieve the issue lies in Net::HTTP::Persistant

jeffreycwitt commented 5 years ago

Ok,

if you can clone this branch of the lbp.rb gem

https://github.com/lombardpress/lbp.rb/tree/feature/close-wait-test

Then run bundle (I'm still using ruby 2.2.2, but plan to update once I solve this).

Then run rspec spec/query_load_spec.rb

This will run 100 requests, delayed every second

Once that is started, find the port of the running process and run

lsof -p <portnumber>

You will start to a list of connections like so

image

but keep running the lsof command.

Eventually connections will start changing to WAIT_CLOSE like this: image

P.s. this has already helped me confirm that the problem is not in the rails app, since i'm reproducing the problem here just in the tests of the lbp.rb gem.

jeffreycwitt commented 5 years ago

I just ran this against dbpedia's SPARQL endpoint as well to make sure my SPARQL endpoint wasn't doing something weird.

I get the same result, established connections, turning to close_and_wait, that don't seem to every fully close.

I pushed the modifications for this as well. If you want to run this test, see the comments in the query_load_spec for the ENV variables that need to be set.

jeffreycwitt commented 5 years ago

Hi @gkellogg, just wondering if my gem branch was sufficient for you to recreate the problem. Let me know what I can do to help track down the issue. Should we create an issue with Net::HTTP::Persistent ???

gkellogg commented 5 years ago

I’ve been at W3C TPAC all week, so haven’t had a chance to follow up. When I can reproduce, perhaps we’ll be able to see if Net::HTTP::Persistent is indeed implicated. We’d need a reproducible use case.

jeffreycwitt commented 5 years ago

Ok no worries whatsoever.

I think have a created a reproducible use case. Whenever you have time see my post above: https://github.com/ruby-rdf/sparql-client/issues/86#issuecomment-431565395.

Let me know if you don't think this is sufficient or can't reproduce yourself. I was able to reduce the problem with both my SPARQL endpoint and the dbpedia SPARQL endpoint .

Best, jw