mloughran / em-hiredis

Eventmachine redis client
MIT License
221 stars 63 forks source link

Pipeline support ? #14

Open fonzo14 opened 12 years ago

fonzo14 commented 12 years ago

Hi,

Do you plan to add support for pipelining ?

Thx

mloughran commented 12 years ago

You can already do this

redis.multi
redis.get...
redis.set...
redis.exec

I'm happy to add a more convenient API, but I've not found one that I'm completely happy with yet

fonzo14 commented 12 years ago

Yes, that's true.

But I was thinking about true pipelining for reducing network latency. In some context, I need to send a lot of request to redis. The multi ... exec command don't solve that problem.

mloughran commented 12 years ago

I'm not sure what you mean. The requests are asynchronous, so you can send requests as rapidly as you like.

fonzo14 commented 12 years ago

Well, sorry my english is so poor. OK let's try with an example. I have an API web app that returns json. My web server is goliath so I'm using em-hiredis in a em synchrony way.

For one particular API, i have to make 100 request to redis so it means 100 hundred networks communications with redis (so it's slow). I would have liked to send juste one request to redis via pipelining.

mloughran commented 12 years ago

The way EventMachine works is that if you send data multiple times to the same connection during a 'tick' of the event loop, the data will be buffered and sent in one go at the end of the tick http://rubydoc.info/github/eventmachine/eventmachine/EventMachine/Connection:send_data. Therefore sending 1000 commands to redis will not result in 1000 TCP packets.

As an example, this example will send all 5 calls to get in the same packet

require 'em-hiredis'

EM.run {
  redis = EM::Hiredis.connect

  EM.add_periodic_timer(2) {
    5.times { |i|
      redis.get("foo#{i}") { |v|
        p [:response, v]
      }
    }
  }
}

Having said this, a little bit of packet capturing has revealed a little problem which I'm hoping @tmm1 will be able to cast some light upon.

It appears that calls to send_data are batched into TCP packets in groups of 16 even when all the data would easily fit into one packet. For example

require 'eventmachine'

EM.run {
  EM.connect('127.0.0.1', 10000) { |c|
    EM.add_periodic_timer(2) {
      20.times { |i|
        c.send_data("hello#{i}")
      }
    }
  }
}

Results in these packets being sent to the dumb server on port 10000

Packet: 9, Packetlength: 158 bytes, Packet follows:

00000   02 00 00 00 45 00 00 9a  89 62 40 00 40 06 00 00    ....E....b@.@...
00010   7f 00 00 01 7f 00 00 01  cb d4 27 10 7d 45 b3 d9    ..........'.}E..
00020   72 8b 7e 78 80 18 ff ff  fe 8e 00 00 01 01 08 0a    r.~x............
00030   0d e6 5a 9e 0d e6 52 d1  68 65 6c 6c 6f 30 68 65    ..Z...R.hello0he
00040   6c 6c 6f 31 68 65 6c 6c  6f 32 68 65 6c 6c 6f 33    llo1hello2hello3
00050   68 65 6c 6c 6f 34 68 65  6c 6c 6f 35 68 65 6c 6c    hello4hello5hell
00060   6f 36 68 65 6c 6c 6f 37  68 65 6c 6c 6f 38 68 65    o6hello7hello8he
00070   6c 6c 6f 39 68 65 6c 6c  6f 31 30 68 65 6c 6c 6f    llo9hello10hello
00080   31 31 68 65 6c 6c 6f 31  32 68 65 6c 6c 6f 31 33    11hello12hello13
00090   68 65 6c 6c 6f 31 34 68  65 6c 6c 6f 31 35          hello14hello15

Packet 2:

Packet: 10, Packetlength: 84 bytes, Packet follows:

00000   02 00 00 00 45 00 00 50  c6 5d 40 00 40 06 00 00    ....E..P.]@.@...
00010   7f 00 00 01 7f 00 00 01  cb d4 27 10 7d 45 b4 3f    ..........'.}E.?
00020   72 8b 7e 78 80 18 ff ff  fe 44 00 00 01 01 08 0a    r.~x.....D......
00030   0d e6 5a 9e 0d e6 52 d1  68 65 6c 6c 6f 31 36 68    ..Z...R.hello16h
00040   65 6c 6c 6f 31 37 68 65  6c 6c 6f 31 38 68 65 6c    ello17hello18hel
00050   6c 6f 31 39                                         lo19

@tmm1 is there a reason for this behaviour? Should em-hiredis therefore be buffering pipelined commands into a string before calling send_data? Thanks!

mloughran commented 12 years ago

https://github.com/eventmachine/eventmachine/blob/master/ext/ed.cpp#L977 (the cause of the 16 batching)

fonzo14 commented 12 years ago

Thanks. Interesting. I did not know this eventmachine behavior. I still have the issue because I'm in a em-synchrony context and then each redis call is waiting for the answer. But it has more to do with em-synchrony and the way I use it than with em-hiredis. I get your point.

But I think anyway that adding a way to buffer pipelined commands would be a nice add-on to em-hiredis (as jedis or redis-rb does).

Thanks again for your answers.

talbright commented 12 years ago

I think though there is a distinct difference between pipelining and using multi, that is that multi is transactional and pipelining is not:

http://redis.io/topics/transactions

http://redis.io/topics/pipelining

pietern commented 12 years ago

Because EM defers all I/O, the buffering is implicit. If you write 100 commands in the same tick, they are automatically pipelined (without taking into account the EM buffering artifact that @mloughran describes).

mloughran commented 12 years ago

@pietern precisely. I can see a weak argument to add a pipelining api in order to work around the EM buffering artefact, but as you say pipelining is automatic.

My opinion is that the current API is good enough, and that if you want to ensure that all the commands are executed by the server consecutively, you can just use multi-exec (and probably should be). Is there a performance penalty on the server for using multi-exec when you don't really need atomicity?

talbright commented 12 years ago

@pietern how does the implicit pipelining work on the response side of the equation?

@mloughran There has to be some penalty for using multi-exec if you have multiple clients talking to the same redis server and you are in a high volume situation:

All the commands in a transaction are serialized and executed sequentially. It can never happen that a request 
issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the
commands are executed as a single atomic operation.
mloughran commented 12 years ago

@talbright however redis is single threaded and only deals with a single client's request at any one time, multi-exec doesn't change this.

On the response side, if there are many responses waiting on the socket, em-hiredis will process all available responses in the same eventloop tick, calling whatever callbacks you have defined on the commands. However if you want to guarantee that you get all responses at the 'same time' in your code then again multi-exec is the way to go. Does that make sense?