Eventmachine support documentation

rubycut commented 11 years ago

Hi,

I am wondering if I can use your gem within eventmachine reactor. Can you add somewhere in documentation is it supported and how to use it or it is planned feature.

I see that you have async client file and io reactor, but I am not sure how that works in relation to eventmachine.

iconara commented 11 years ago

You can't use EventMachine as IO reactor for cql-rb. It has its own reactor implementation, and even though it's designed to be interchangeable there's no alternative implementation currently (except in the tests, which use a fake IO reactor that is more instrumentable). I have an ambition of writing a Netty-based IO implementation to give JRuby even better IO performance, but that's something for v2.0, it's not currently on the roadmap.

If you want to integrate EventMachine with cql-rb it's certainly possible. You'd have to create custom implementations that provided the same interface as Cql::Io::IoReactor and Cql::Io::Connection, and then pass an instance of your own reactor to the client via the :io_reactor option.

rubycut commented 11 years ago

I might not need to write it. Because as far as I understand, your code will not block the Eventmachine reactor, since it runs all operations in separate thread. That means that I can run your gem inside eventmachine reactor and it will not block it, since io operations are done in separate thread. Am I right?

iconara commented 11 years ago

Yes and no.

cql-rb's reactor will not block EventMachine's. But if you do something like this:

client = Cql::Client.connect
EM.start do
  EM.next_tick do
    puts 'hello world'
  end
  client.execute('INSERT INTO stuff (something) VALUES (42)')
end

then "hello world" would not be printed until the Cassandra operation had returned. In other words, the cql-rb API is a blocking API, even though under the covers it is fully asynchronous.

There's two ways around this: either you run your EventMachine and Cassandra operations in separate threads (not counting cql-rb's own IO thread), or you try cql-rb's experimental asynchronous API.

Depending on which version of cql-rb you're using it looks a bit different. Not very much, but it's experimental, and will remain experimental. At the same time, it's the interface I'm primarily using myself, for performance reasons, and it's actually how cql-rb works under the hood. If you look at the implementation of the default client Cql::Client::SynchronousClient in the latest version (v1.1.0.pre7) you can see that all it does is call the same method on Cql::Client::AsynchronousClient and then wait for the result of the operation.

There's no documentation for the async API, but here's a quick start:

client = Cql::Client.connect
future = client.async.execute('SELECT * FROM stuff')
future.on_value do |result|
  result.each do |row|
    p row
  end
end

Notice that I called client.async.execute(...) and not client.execute(...). Async clients return futures, and you can register callbacks on futures. You can also chain futures and transform them in different ways, but you'll have too look at the documentation of Cql::Future for that (it's not in the public docs, but look at the code, there's a few examples, and the specs have even more).

Using the async API you can avoid blocking your EventMachine reactor.

rubycut commented 11 years ago

I am using cql-rb version 1.0.5 and by executing statement.async.execute instead of statement.execute, I am seeing 300% increase in speed, I am able to process 150 messages instead of 50 messages per second.

But of course, other things are also using CPU, so, I am thinking this is at least 10 times faster if we look only on time used by cql-rb.

I suggest you mentioned this somewhere in the docs, and let people try it.

Cassandra is super fast in swallowing data, and it's real deal breaker if client library can't push data fast enough.

michaelklishin commented 11 years ago

The version that uses futures is not exactly identical in the way failures are handled. I would document both versions and explain the difference but would highly recommend using the sync version in most examples. Failures with it are immediately visible.

iconara commented 11 years ago

It’s good to know that the async API works for you, thanks for trying it out. As for making it public I’m reluctant to do that before I feel like it is something I want to support long time. It’s so much harder making an async API, and so much harder to use it right (error handling, like Michael mentions is one example of something people get very wrong).

I’m happy to tell people about it, and have them testing it out like you do, and doing it this way means it’s more clear that this is experimental. When cql-rb has gotten more adoption and there is a larger user base with high performance needs I will make an effort to finalize the async API and publish it.

You should be able to run around ten thousand prepared statement executions per second with cql-rb. Sometimes you need to have multiple connections to each node open for that to happen, but thousands of operations per second should be possible without any tuning at all. For the best performance try the v1.1 prereleases, they have quite a few performance optimizations.

rubycut commented 11 years ago

ok, closing this. Thanks for all the info.

iconara / cql-rb

Eventmachine support documentation #48