uber / tchannel

network multiplexing and framing protocol for RPC
MIT License
1.15k stars 129 forks source link

Idea: libuv socket relay #920

Open Raynos opened 9 years ago

Raynos commented 9 years ago

After trying to get as much performance out of node as possible it looks like the remaining time spend on the CPU is mostly doing socket read and writes.

There is an open PR ( #916 ) for a minimal relay in node and there is a hosted flame graph

To truly get the next order of magnitude of performance we need to write the relaying code in a different language. One approach would be to write the actual relaying and socket logic in C.

I'm not a C/C++ programmer, however myself and @Matt-Esch brainstormed an idea on how to implement the actual relaying part of ( which is 90% of the flamegraph ) in C/C++.

libuv relay server

var LibuvTChan = require('libuv-tchannel');

var parse = new LibuvTChan();

// You get frames form the channel
parser.onFrame = onFrame;

// You create a tcp server in node
net.createServer(onConnection);

function onConnection(socket) {
  parser.manageSocket(socket._handle);

  // tchannel Connection/Channel node.js code
}

// You create out sockets in node
var socket = net.createConnection(host, port);
parser.manageSocket(socket._handle);

// tchannel Connection/Channel node.js code

// You can forward frames through the parser
parser.sendVolatileFrame(socket._handle, VolativeFrame);

// You can also send frames through the parser
parser.sendPersistentFrame(socket._handle, PersistentFrame);

// Any information for stats and logs will be
// sent to javascript
parser.onStats = onStats;

The idea is that all the actual tcp read and write logic is rewritten in C. This removes the overhead of node's TCP implementation.

This also removes all buffer manipulation overhead in node.js

Interface

Parser.onFrame

The parser.onFrame function must be set in JavaScript and is a function that takes a VolatileFrame.

A VolatileFrame is backed by buffer in C. A VolatileFrame can be one of the N types of frames in the protocol.

For our forwarding use cases the VolatileFrame has a few fields that can be read and a few mutable fields. The mutable fields are id and ttl.

A VolatileFrame also has an persistent() method that returns a PersistentFrame object that is fully realized.

For performance reasons the C implementation will recycle the VolatileFrame immediately after the function call finished.

This means you must do one of two things synchronously:

Note that currently in our relay implementation we wait for identification in the socket. To be able to make synchronous forwarding decisions we will have to synchronously forward a Declined error frame when a connection is not initialized.

Parser.manageSocket(handle)

If you have a TCP Socket in node you can pass the handle to libuv and it will manager the reading of all incoming frames for you.

Every time it reads a frame it calls Parser.onFrame.

Parser.sendVolatileFrame(handle, VolatileFrame)

For doing efficient forwarding you can mutate the VolatileFrame emitted by onFrame and send it directly to a different handle.

Parser.sendPersistentFrame(handle, PersistentFrame)

If you want to send a frame without having any other frames you can do so with sendPersistentFrame(). It's expected that the javascript code has a pool of persistent frame objects that it can mutate and send.

It's safe to assume that the persistent frame can be recycled and mutated again after the sendPersistentFrame() call is done.

Big ideas

The big idea here is that a nodejs tchannel relay is just a ringpop cluster that manages connections.

The actual work of parsing TCP and writing to TCP is all handled in a really efficient shared C library.

Volatile Frame vs Persistent Frame

Volatile Frame

A Volatile frame is created in C++ and has a piece of memory that is the actual frame buffer associated with it. A VolatileFrame only exposes information to JavaScript that is absolutely needed by the relay code.

All volatile frames have the following fields:

The size field is hidden and only available in C++.

For each one of the types of frames a volatile frame supports more information. In the current case the only frame type that has more information is CallRequest which exposes the following fields

A persistent frame can only be create from JavaScript. There are unique persistent frame constructors for all types of frames; for each persistent frame constructor it has mutable fields for all the pieces of information in the protocol document

There are two ways of creating an persistent frame

By only moving the socket and parsing code into C/C++ we can continue to re-use the following

Our flamegraphs demonstrate clearly that more then 90% of the CPU is actual forwarding and network logic that has very little to do with the rest of the node implementation; for example only 2% of the process is unoptimized timeout logic, only 4% of the process is unoptimized peer selection. Those parts still have room for optimization but are not the bottleneck

Rather then investing in a complete re-implementation of the entire hyperbahn system in a new language including:

It would be ideal to write a minimal implementation of a relay in C/C++ to get our next order of magnitude in performance.

We could implement the minimal relay in go/java and shell out to that from node however that would be difficult to do. There is no standard way to call into java/go from node, you would first have to call into C++ and then call into go/java. The real performance gains to made is a tight coupling to the v8 C++ interface to have a minimal memory allocation overhead as well as having a tight coupling to the node TCPWrap C++ class and libuv so we can just move the minimal hot path socket manipulation code into a non-javascript language.

How do we get this tested.

The existing node tchannel code has a large suite of integration tests that treat all networking code as a black box. Futhermore we have a large suite of tests in rt/hyperbahn as well.

Because the relay server is designed to only handle socket reads we can completely abstract away the fact we are using C/C++ at all in our connection.js class. The vast majority of our tests treat the connection class as a black box and will allow us to re-use the existing nodejs test to verify the C/C++ code.

Futhermore, writing C/C++ addons is a fully supported feature for any node.js project. It's very easy to make binary code a part of the entire engineering workflow and it's pretty easy to import C++ classes into javascript itself.

cc @jcorbin @prashantv @blampe @mranney

blampe commented 9 years ago

I remember @breerly tossing around the idea of frame parsing as a C library that could be shared across languages.

If Node performance is leaving us wanting more, then I'd strongly prefer to go down this route versus a Hyperbahn rewrite in Go. We're not domain experts here but we can build something that works. Having one, consistent implementation of the low-level protocol details that we can easily share across languages would be huge.

For an example of how Python could benefit from this: compiling frame parsing to Cython (that is, C but still with a bunch of overhead to support Python duck typing) gives us a ~10x speedup (!!).

jc-fireball commented 9 years ago

I guess this may be a long term plan. I would image the infra work to support C/C++ lib in current production will take certain amount time.

Raynos commented 9 years ago

The infra work to support C/C++ in production is not too bad. It's just a binary node library like any other binary node library (we already have binary libraries for farmhash etc).

HelloGrayson commented 9 years ago

We could use grpc as a reference; their main grpc repo has a C extension that is used in C++, Node.js, Python, Ruby, Objective-C, PHP, & C#.

https://github.com/grpc/grpc/tree/master/src/core

On Tue, Jul 14, 2015 at 12:12 AM, Jake Verbaten notifications@github.com wrote:

The infra work to support C/C++ in production is not too bad. It's just a binary node library like any other binary node library (we already have binary libraries for farmhash etc).

— Reply to this email directly or view it on GitHub https://github.com/uber/tchannel/issues/920#issuecomment-121149873.