Response protocol not efficient enough for socket programming

GoogleCodeExporter commented 9 years ago

Redis is a fast database. I am using Python to connect to Redis. I tested 
redis-py but was not 
impressed with the speed. The main issue seems to be efficient socket 
programming in Python 
and not the Redis database. So, I tried to write my own Python Redis client. 
Please note that I am 
just a medior Python programmer.

This article (http://www.amk.ca/python/howto/sockets/) has the following to say 
about sockets: 

"I repeat: if a socket send or recv returns after handling 0 bytes, the 
connection has been broken. 
If the connection has not been broken, you may wait on a recv forever, because 
the socket will 
not tell you that there's nothing more to read (for now). Now if you think 
about that a bit, you'll 
come to realize a fundamental truth of sockets: messages must either be fixed 
length (yuck), or 
be delimited (shrug), or indicate how long they are (much better), or end by 
shutting down the 
connection. The choice is entirely yours, (but some ways are righter than 
others)."

Redis seems to be using a delimited response. This means I have to do a lot of 
socket.recv's to 
retrieve a response.

For example. In a multi-bulk response you can find out the number of items that 
is going to be 
returned. But, there is no information on how many bytes the total response is 
going to be. So, 
you are left with the only option of doing multiple socket.recv's to retrieve 
the response.

So, I have the following feature request: include the number of bytes of the 
TOTAL response at 
the beginning of each response (especially multi-bulk responses). This gives 
client programmers 
the opportunity to retrieve the response in one call and then proces the 
response on client side. 
This could improve the performance of pipelined requests even further.

For example, change this:

C: LRANGE mylist 0 3
S: *4
S: $3
S: foo
S: $3
S: bar
S: $5
S: Hello
S: $5
S: World

into this:

C: LRANGE mylist 0 3
S: *4$24
S: $3
S: foo
S: $3
S: bar
S: $5
S: Hello
S: $5
S: World

*4$24 indicates that 4 items are going to be send with a total length of 24 
bytes (excluding the 
\r\n's - in this case 4 * 2 * 2 bytes = 16 bytes).

Thanks.

Original issue reported on code.google.com by berry.gr...@gmail.com on 13 May 2010 at 2:47

GoogleCodeExporter commented 9 years ago

Why don't you read as much as you can into a local buffer and then parse that?

Original comment by mel...@gmail.com on 13 May 2010 at 3:04

GoogleCodeExporter commented 9 years ago

Hello Berry,

probably what you are noticing is the delay due to the round trip time, not 
something related to parsing 
speed.
The problem with speed can be fixed with a buffer if you really need to go 
faster, but I bet this is not going to 
be your bottleneck.

The are several reasons why there is not a prefixed length:
- In our benchmarks this is not the real problem.
- when replies are big, to read the whole thing is not a good idea anyway. the 
current format is designed to be 
parsed as a stream.
- to buffer the whole reply in order to be able to attach a prefix length is 
slow or tricky depending on the 
implementation... there is the need to place sentinels in the output buffer and 
so forth.
- We'll add UDP for very high performance read queries

Cheers,
Salvatore

Original comment by anti...@gmail.com on 13 May 2010 at 3:15

GoogleCodeExporter commented 9 years ago

Using buffered reads drastically reduces the number of socket.recv's and thus 
improved the performance. Thanks 
for the tip.

Original comment by berry.gr...@gmail.com on 13 May 2010 at 10:39

GoogleCodeExporter commented 9 years ago

Please feel free to contribute patches to redis-py if you find ways to speed 
things up.

Original comment by sed...@gmail.com on 14 May 2010 at 12:55

GoogleCodeExporter commented 9 years ago

@sedrik I am still investigating. If I find some useful improvements I will 
contribute patches to redis-py.

Original comment by berry.gr...@gmail.com on 14 May 2010 at 5:30

GoogleCodeExporter commented 9 years ago

I have to agree with the OP. I've built a bson parser (for the C# MongoDB 
drivers),
and such a protocol is easier to parse, faster to parse, and causes less memory
fragmentation. Sure, its relatively trivial if you wanna use a blocking 
operation and
treat the TCP stream like a filestream and execute a ReadLine(), but if you 
wanna use
asynchronous callbacks and enable connection pooling - data fragmentation is a 
pain
to deal with. Even something as basic as having a 2-byte delimiter causes 
unnecessary
complication.

The docs give the following as a sample reply to GET mykey:
$6\r\nfoobar\r\n

I think this would be much simpler:
$6foobar

where 6 is a 4-byte integer. This wouldn't only make the data stream slightly 
smaller
but would also make parsing much easier.

Original comment by google84...@openmymind.io on 19 May 2010 at 12:48

GoogleCodeExporter commented 9 years ago

I don't see why BSON would be fundamentally easier or faster to parse. The only 
difference between BSON and 
this protocol is the newlines, and for a good reason. The newlines are in the 
protocol so debugging a stream of 
data is as simple as attaching a netcat, and get readable data, not for 
allowing people to use readline(). Moreover, 
I would highly suggest against this. Every argument is prefixed with its 
length, so there is no need to use 
readline(). Even better, every client lib author can simply do a read() and 
start traversing the buffer, using a 
simple index. Not that complex nor slow in my opinion.

Original comment by pcnoordh...@gmail.com on 19 May 2010 at 8:25

hufei01 / redis

Response protocol not efficient enough for socket programming #243