Use zero copy ffi.from_buffer for bytes types in `Socket.send()`

mayfield commented 8 years ago

With respect to Socket.send()...

In cffi 1.8 you can use ffi.from_buffer on bytes types which provides a nice perf boost over ffi.new() when the data is large. (30% in my tests for a ~1MB buffer).

It also might also be more efficient to avoid the type checking and just assume a from_buffer type will be used (fast path) and do any conversion to a bytes type in an exception handler (slow path).

I've not tested on python2 nor do I understand the ramifications of such a change to that platform.

djc commented 8 years ago

We can update Socket.send() to check for bytes as well, if so we probably need to update the dependencies so we always pull in cffi-1.8+.

Exceptions for control flow aren't the cleanest code, but if you're able to prove your case with a benchmark we can think about it.

mayfield commented 8 years ago

I'll throw a little something together. I played with this more last night and there is nearly 20% perf boost even for 1k messages on python3.5.

djc commented 8 years ago

Okay. Note that it has to work on Python 2.7 for this to be merged.

mayfield commented 8 years ago

Here are some simple benchmarks (not terribly scientific)..

nnpybench.pdf

Using this test code:

import time
import nnpy
import sys

PACKETS = 50000
MSG = b'\xff' * 1024

def echo(req, reply):
    req.send(MSG)
    got = reply.recv()
    assert got == MSG, got
    reply.send(got)
    got = req.recv()
    assert got == MSG, got

def setup(addr):
    req = nnpy.Socket(nnpy.AF_SP, nnpy.REQ)
    req.bind(addr)
    reply = nnpy.Socket(nnpy.AF_SP, nnpy.REP)
    reply.connect(addr)
    return req, reply

def test(req, reply):
    for i in range(PACKETS):
        echo(req, reply)

for addr in 'tcp://127.0.0.1:6423', 'inproc://foo', 'ipc://bar':
    req, reply = setup(addr)
    time.sleep(0.500)
    start = time.time()
    try:
        test(req, reply)
    except Exception as e:
        print("%20s, %25s,,,FAILED,%s" % (sys.argv[1], addr, e))
    else:
        took = time.time() - start
        pps = PACKETS / took
        print("%20s, %25s, %f, %d" % (sys.argv[1], addr, took, round(pps)))

And these two patches

instance_check patch

diff --git a/nnpy/socket.py b/nnpy/socket.py
index 2ce24d3..37bec69 100644
--- a/nnpy/socket.py
+++ b/nnpy/socket.py
@@ -67,7 +67,7 @@ class Socket(object):
         return errors.convert(rc, rc)

     def send(self, data, flags=0):
-        if isinstance(data, memoryview):
+        if isinstance(data, (memoryview, bytes)):
             buf = ffi.from_buffer(data)
         else:
             data = data.encode() if isinstance(data, ustr) else data

try_from_buffer_first patch

diff --git a/nnpy/socket.py b/nnpy/socket.py
index 2ce24d3..10d4d72 100644
--- a/nnpy/socket.py
+++ b/nnpy/socket.py
@@ -67,9 +67,9 @@ class Socket(object):
         return errors.convert(rc, rc)

     def send(self, data, flags=0):
-        if isinstance(data, memoryview):
+        try:
             buf = ffi.from_buffer(data)
-        else:
+        except TypeError:
             data = data.encode() if isinstance(data, ustr) else data
             buf = ffi.new('char[%i]' % len(data), data)
         rc = nanomsg.nn_send(self.sock, buf, len(data), flags)

djc commented 8 years ago

Okay, I'm happy to take your try_from_buffer_first patch. Can you add a little comment? It should probably mention (a) that it's there for performance reasons, (b) that the path taken depends on the cffi version.

mayfield commented 8 years ago

You bet. PR incoming.

wrobell commented 8 years ago

This FAQ entry is worth considering https://docs.python.org/3/faq/design.html#how-fast-are-exceptions.

Wouldn't it better to allow Socket.send to accept bytes/buffers only? IMHO, the method gets more and more complex and is not symmetric with Socket.recv. For example in pyzmq, there methods like Socket.send_string/Socket.recv_string.

Keep it simple and avoid explaining various nuances of Socket.send in documentation :)

djc commented 8 years ago

To some extent I feel that 5269d26 made it less complex, actually.

Do you have a concrete problem with the new behavior?

wrobell commented 8 years ago

I would just change the method from

    def send(self, data, flags=0):
        """
        ... docs plus explanation when zero-copy is performed and for which types of data the method is efficient...
        """
        # Some data types can use a zero-copy buffer creation strategy when
        # paired with new versions of CFFI.  Namely, CFFI 1.8 supports `bytes`
        # types with `from_buffer`, which is about 18% faster.  We try the fast
        # way first and degrade as needed for the platform.
        try:
            buf = ffi.from_buffer(data)
        except TypeError:
            data = data.encode() if isinstance(data, ustr) else data
            buf = ffi.new('char[%i]' % len(data), data)
        rc = nanomsg.nn_send(self.sock, buf, len(data), flags)
        return errors.convert(rc, rc)

to

    def send(self, data, flags=0):
        """
        Send data to the socket.

        The `data` is buffer (bytes, bytearray) object.

        :param data: Data to be sent.

        .. seealso:: `ffi.from_buffer <http://cffi.readthedocs.io/en/latest/ref.html#ffi-buffer-ffi-from-buffer>`_
        .. seealso:: `nn_send <http://nanomsg.org/v1.0.0/nn_send.3.html>`_
        """
        buf = ffi.from_buffer(data)
        rc = nanomsg.nn_send(self.sock, buf, len(data), flags)
        return errors.convert(rc, rc)

djc commented 8 years ago

That just makes it harder to use. IMO supporting strings is pretty useful, and doing so the way it is now makes sense in a language where "duck typing" is a best practice. Now, if you want to improve the docstring, I'd be happy to accept a pull request for that.

(Also, you have not stated a concrete problem with the new behavior.)

wrobell commented 8 years ago

According to linked FAQ entry above, catching an exception is expensive. It would be nice to make users aware about this and put into documentation of Socket.send - strings are supported, but your calls are slowed down because the conversion is handled via try/except clause. Knowing that, I will convert strings by myself, which makes Socket.send supporting strings not so useful anymore.

I would simply keep it simple and allow users to decide how to perform conversion (and there are plenty of ways including single dispatch :)).

djc commented 8 years ago

Expensive is a relative term. I prefer to make the library work for people who are just getting started with the simplest thing possible, which I guess does not fit your understanding of "simple", but sure is important for mine. On the other hand, people who need to squeeze out all the performance can do a bit more work to find out where the bottlenecks are, and if this particular method pops up they can check out the documentation and/or comments to figure out that there are faster ways to use it.

nanomsg / nnpy

Use zero copy ffi.from_buffer for bytes types in `Socket.send()` #34