iconara / cql-rb

Cassandra CQL 3 binary protocol driver for Ruby
106 stars 31 forks source link

Memory leak while using prepared statements #93

Closed valums closed 10 years ago

valums commented 10 years ago

Hi Theo,

Thanks so much for your work on the cql-rb driver, the codebase and docs are excellent, so much better than that twitter thrift based client.

We have found a memory leak while using prepared statements which is easily reproducible. For our web use-case the memory goes from 80MB per thread to 1GB overnight.

Gem version: tested with 1.1.1, 1.2.1, 2.0.0.pre1 Casandra version: tested with with cassandra-1.2.12, cassandra-2.0.6 Ruby version: ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-linux]

Code:

require 'cql'
cql = Cql::Client.connect(:hosts => ["10.0.0.1"])
cql.use 'test'

ts = Cql::TimeUuid::Generator.new.from_time(Time.now)
while true do
  statement = cql.prepare("INSERT INTO test_table (ts) VALUES (?)")
  statement.execute(ts)
end

Steps to reproduce:

apache-cassandra-1.2.12/bin/cqlsh 10.0.0.1

CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

CREATE TABLE test_table (ts timeuuid,  PRIMARY KEY (ts));

ruby app.rb

pmap -x [put process pid here]

wait a few minutes

memory used is up to 35MB from 15MB and growing fast

Our workaround:

Use queries with bound params from the 2.0.0.pre1 version of the gem.

iconara commented 10 years ago

Thank you for the detailed bug report, it's great to get one so well researched.

Yes there is a small leak with prepared statements, but it is not relevant if you use prepared statements correctly. As mentioned in the readme and in other places prepared statements are great because they are reusable. If you don't reuse them they're worse than non-prepared statements because they require an extra round-trip to the server.

Prepare your statements when your application starts up and then reuse them for the whole lifetime of the application. The reason why I've let that leak remain is that prepared statements are long lived objects, they are meant to be even more long lived than connections (the leak occurs because the statement objects attach themselves to the connection), and from that perspective it's not really a leak at all.

iconara commented 10 years ago

In other words: move the statement preparation out of the loop and the leak goes away.

valums commented 10 years ago

Ok, thanks for the quick response. Just to clarify, we do not create statements in a loop, they are created in response to http query in a long running process and leak memory over time. But I guess it is a good idea to cache them globally anyway. Feel free to close this issue.

iconara commented 10 years ago

I figured it was something like that, good that you wrote the example the way you did though, it helped me understand the root issue pretty quickly.

You should definitely create the prepared statements and reuse them, otherwise your requests will be unnecessarily slow. The preparation isn't super-expensive, but it sends a request to each node and waits for them all to respond – and only after that can you do the actual request. Using on-the-fly prepared statements with v2.0.0.pre1 is an alternative, as you've discovered, it's a middle ground. It's got the same benefits convenience-wise as prepared statements, and don't have the extra round-trip to the server.

v2.0.0.pre1 is stable, if you've run it for a while in a testing environment I wouldn't worry too much about running it in production. There won't be many changes before I release v2.0.0, I just want to make sure I've set the API right since I won't be able to change that for a while after releasing a major version.