Why RPC calls are so slow?

barbushin commented 8 years ago

I was interested what will be the time of calling trivial function with Thruway on local machine(Intel i7, SSD, Win 10x64, PHP 5.6).

So I disabled all logs and outputs, started scripts from CLI, and I was very surprised with results:

router.php

use Thruway\Peer\Router;
use Thruway\Transport\RatchetTransportProvider;

$router = new Router();
$transportProvider = new RatchetTransportProvider("127.0.0.1", 9090);
$router->addTransportProvider($transportProvider);
$router->start();

client_ping.php

use Thruway\ClientSession;
use Thruway\Peer\Client;
use Thruway\Transport\PawlTransportProvider;

Thruway\Logging\Logger::set(new Psr\Log\NullLogger());

$client = new Client("realm1");
$client->addTransportProvider(new PawlTransportProvider("ws://127.0.0.1:9090/"));
$client->on('open', function (ClientSession $session) {
    for($i = 0; $i < 30; $i++) {
        $session->call('func', [microtime(true)])->then(
            function ($start) {
                $time = microtime(true) - (string)$start;
                echo "Time: $time\n";
            },
            function ($error) {
                echo "Call Error: {$error}\n";
            }
        );
    }
});
$client->start();

client_pong.php

use Thruway\ClientSession;
use Thruway\Peer\Client;
use Thruway\Transport\PawlTransportProvider;

Thruway\Logging\Logger::set(new Psr\Log\NullLogger());

$client = new Client("realm1");
$client->addTransportProvider(new PawlTransportProvider("ws://127.0.0.1:9090/"));
$client->on('open', function (ClientSession $session) {
    $session->register('func', function ($args) {
        return $args[0];
    });
});
$client->start();

Results

Time: 0.02100920677185059
Time: 0.01913905143737793
Time: 0.01973581314086914
Time: 0.02045106887817383
Time: 0.02109289169311523
Time: 0.02178597450256348
Time: 0.02236199378967285
Time: 0.02300190925598145
Time: 0.02365303039550781
Time: 0.02436494827270508
Time: 0.0250709056854248
Time: 0.02570605278015137
Time: 0.02640104293823242
Time: 0.02705502510070801
Time: 0.02766609191894531
Time: 0.02826905250549316
Time: 0.02883100509643555
Time: 0.0294349193572998
Time: 0.03014111518859863
Time: 0.03057718276977539
Time: 0.03104710578918457
Time: 0.03148818016052246
Time: 0.03200292587280273
Time: 0.03266501426696777
Time: 0.03319597244262695
Time: 0.03388500213623047
Time: 0.03449392318725586
Time: 0.03490591049194336
Time: 0.03531002998352051
Time: 0.03569793701171875

Is it okay that:

Fastest response time is 19ms?
Response time is growing up?

I tested on Linux machine and result was almost the same.

davidwdan commented 8 years ago

@barbushin That's a very good question. There are 2 reasons for that.

1 - The for loop is blocking. If you use the event loop's periodic timer, you should get a modest bump in performance.

example:

    $count = 0;
    $loop->addPeriodicTimer(Timer::MIN_INTERVAL, function ($timer) use (&$count, &$total, $session) {

        if ($count >= 30) {
            $timer->cancel();
        }
        $count++;

        $session->call('func', [microtime(true)])->then(
            function ($start) {
                $time = microtime(true) - (string)$start;

                echo "Time: $time\n";
            },
            function ($error) {
                echo "Call Error: {$error}\n";
            }
        );
    });

2 - By default Ratchet has utf8 encoding enabled, which can really slow things down. Unfortunately, I don't think that there is a way to disable it without extending the transport provider:

    public function handleRouterStart(RouterStartEvent $event)
    {
        $ws = new WsServer($this);
        $ws->setEncodingChecks(false);
...

With those changes I'm getting this on my laptop with php5.5:

Time: 0.004225969314575195
Time: 0.004382133483886719
Time: 0.005103111267089844
Time: 0.003340959548950195
Time: 0.003995895385742188
Time: 0.002504825592041016
Time: 0.003161907196044922
Time: 0.002908945083618164
Time: 0.002924919128417969
Time: 0.003277063369750977
Time: 0.004749059677124023
Time: 0.005089998245239258
Time: 0.005650043487548828
Time: 0.004929065704345703
Time: 0.0041961669921875
Time: 0.004114151000976562
Time: 0.003968000411987305
Time: 0.003080129623413086
Time: 0.003146886825561523
Time: 0.002631902694702148
Time: 0.003406047821044922
Time: 0.002511024475097656
Time: 0.004240036010742188
Time: 0.00388789176940918
Time: 0.004359006881713867
Time: 0.004863977432250977
Time: 0.006148099899291992
Time: 0.006670951843261719
Time: 0.005499839782714844
Time: 0.004045963287353516
Time: 0.002842903137207031

cboden commented 8 years ago

Ratchet will soon have way faster UTF-8 checking! :-)

barbushin commented 8 years ago

Yes, but as I see, every single request is going though router(client1 -> router -> client2), and so it looks like in some case router could become bottle neck.

Is it possible to optimize Thruway for some local services clients to work like:

client1 -> router -> routs(no other clients) -> client1
client2 -> router -> routs(client1) -> client2
router -> routs(client2) -> client1 // update clients list on client1
client1 -> request -> client2 // direct

Thats how P2P works, and that's why it's so fast and stable.

I'm working on different kind of SOA platforms, and this moment is very important for highload and stability.

barbushin commented 8 years ago

Thank you guys so much for your answers! $loop->addPeriodicTimer() increased ping-pong to work in 1.6ms, and disabling UTF8 check gave me 1.3ms.

I'll try to use Thruway in some very highload project. I hope finally there will be something to contribute :)

mbonneau commented 8 years ago

@barbushin ,

I am glad to hear that worked for you. Those times seem very good.

As far as the question of bypassing the router and going directly from one client to another: WAMP is designed to build distributed systems out of application components which are loosely coupled.

The "loosely coupled" idea provides for an environment of flexibility. Any client that can connect to the router can communicate with any other client in the ecosystem and provide RPC endpoints, make calls, publish, and subscribe to topics.

The flexibility does come at a cost though. It is not as fast, and could never be as fast as P2P. But at the same time, you don't have to worry about peers being able to communicate directly with each other through firewalls and NAT etc.

oberstet commented 8 years ago

As @mbonneau mentions, WAMP always runs through a router - this provides decoupling and flexibility, at the price of "higher" latency compared to direct messaging.

The question of what is "high latency" can only be answered based on concrete requirements.

I did some quick measuremens using Crossbar.io, and the round-trip latency of a call originating from a caller process, routed across Crossbar.io to a callee, all on one host is as low as 400 microsecs.

https://github.com/crossbario/crossbarexamples/tree/master/benchmark/rpc_roundtrip

This is possible when running WAMP over Unix domain sockets, using RawSocket and MessagePack.

Note: all of above isn't about scalability. Throughput is something different. Eg. a single router worker process on Crossbar.io is able to route 30k calls/sec. Again: if his is a bottleneck or no depends on use case and application.

barbushin commented 8 years ago

So all clients communication is going through one server-router, right? For my project I can say that this is really very bad, because:

30/k sec is not enough
I'll definitely have a problem with bandwidth on server-router
I don't understand how it's possible to make failover and scalable server-routers (i.e. between different AZ in AWS)
in some case latency could be critical problem

But at the same time, you don't have to worry about peers being able to communicate directly with each other through firewalls and NAT etc.

Projects that are based on SOA principles usually have many servers, clusters of servers that implement different services and communicate with each other in some private network. It's not a big problem to organize direct connections between servers, but having just 1 server-router - this is going to be a big problem.

oberstet commented 8 years ago

(1) How much calls/sec do you need? Also: 30k/s is Crossbar.io using one CPU core

Here is a test pushing Web (HTTP, not yet full WAMP) on Crossbar.io on multi-core (using 40 cores):

https://github.com/crossbario/crossbarexamples/tree/master/benchmark/web

over 627990 HTTP requests/s at 360 us avg latency
over 12.6 GB/s HTTP reply traffic

(2) I doubt that;) Modern hardware can easily push a 10GbE, and with some tuning also 100GbE NICs. What hardware do you run?

(3) This is a WAMP router specific thing .. for Crossbar.io, we'll have that.

(4) If 400us RTT is too much, I am afraid you will have to look elsewhere. Making 2 processes on one machine talk to each other with latency say 10 us .. good luck;) As a started: look for a different OS/kernel.

communicate with each other in some private network.

WAMP is designed to allow integration of wide-area distributed systems with parts behind NATs and the like. This is the standard case with IoT applications.

barbushin commented 8 years ago

@oberstet What about fail-over? Unless you have only one server-router there is no way to talk about stable architecture. If this server will go down, or network connection to this server will be broken then all services will stop working, right? And there is no way to use 2-3 server-routers with some balancer?

About 400us: sorry, I read it as 400ms.

barbushin commented 8 years ago

Guys, really, what about fail-over? Unless you have only one server-router there is no way to talk about stable architecture. If this server will go down, or network connection to this server will be broken then all services will stop working, right? And there is no way to use 2-3 server-routers with some balancer?

RafaelKa commented 8 years ago

@barbushin: where is the difference to (balancer >===< routers) architecture if the balancer fails?

barbushin commented 8 years ago

@RafaelKa https://en.wikipedia.org/wiki/Load_balancing_(computing)#Relationship_to_failovers

RafaelKa commented 8 years ago

@barbushin: sorry but my question relates to "how can you balance permanent connections"? If your OS crashes, RAID can not help, or not?

barbushin commented 8 years ago

@RafaelKa Sorry, but are you just asking about if there is any reason of using balancer for building fail-over cluster? :)

Usually it's much lower risk that balancer fails. Also, there are a lot of hardware-based balancers that are extremely fast & stable. And if we're talking about cloud hosting(as AWS), they provide guaranteed failover stable balancers. Anyway, there is a hundreds way how to setup failover balancer, but for now I don't see at least one solution for developing failover Thruway router-server.

barbushin commented 8 years ago

@RafaelKa Okay, here is example of building failover server without balancer, but still with multiple router-servers:

There are 3 router servers
IP addresses of this servers are listed on every single client
Before connecting to router client choose random server IP, and trying to connect
If connection fails client use another IP

But the problem is that you cannot scale router servers like this because routers connections state is not synchronized. Am I wrong?

And there is no need to "balance permanent connections" because WAMP clients will automatically reconnect if connection fails.

RafaelKa commented 8 years ago

Hmmm, but you can theoretically add internal client to each router and implement synchronisation of data. If you looking in WAMP spec, there are sessions, subscriptions etc. which must be cloned between routers.

barbushin commented 8 years ago

Sure, I can find some workaround for all of this. What I'm saying is that for now Thruway implementation of WAMP does not looks like production ready solution for highload and failover services.

Everyone who is going to use it should keep in mind that if there will be something wrong with router server, then all services will fall down.

RafaelKa commented 8 years ago

Or using some backend with failover like MySQL/Postgre/Redis(JSON) to store data in.

barbushin commented 8 years ago

@RafaelKa But I can't see that it's implemented in Thruway. For now I see that sessions data is stored only in router process memory, so you need to implement cross-routers sessions sync by yourself. Right?

RafaelKa commented 8 years ago

Yes, and proxying to. So that messages are delegated to right client.

I like this balancing matter and i can not find anything about router balancing in spec, could you please create new ticket in wamp-proto to discuss about it.

PS: Some thinks about balancing have led me to the thought, that the session reconnection and disconnected-session timeout must be specified in wamp specs, if some releable fail-over backend is used to store the data.

RafaelKa commented 8 years ago

@barbushin: I thougtth about https://github.com/wamp-proto/wamp-proto/

RafaelKa commented 8 years ago

@oberstet: Is some Router load balancing in crissbar.io router implemented?

mbonneau commented 8 years ago

@barbushin - I have approached this issue a few times with specific regard to Thruway. It is a difficult problem.

When designing fault tolerance, the actual goals and architecture of the application are very important. Some high-availability solutions for one application may be suboptimal for others and vice-versa. This fact makes a "silver bullet" solution difficult and problematic (for me to implement anyway).

For some situations it would be possible to create a HA router solution by treating the clients as the fault-tolerance point. Many of the applications I have built are heavy on what I call "backend clients". They serve database actions, do notifications, etc. These processes are easy to deploy on many "servers" and can connect to many routers to provide their services. This would allow you to have many routers to choose from to get to these endpoints. There are issues with this setup that would require the attention of the application architect to solve. For instance publish/subscribe would need to be implemented with multiple routers in mind and replication by back-to-back clients. RPCs offered by the "frontend clients" would not be accessible to frontend clients connected to other routers.

As far as a more tightly coupled implementation in Thruway, there is none. However, the architecture of Thruway is very modular. The Broker and the Dealer are individual modules that can be reworked or even replaced (with minor modifications) with a custom Broker or Dealer that does implement some sort of automatic replication.

This has also been discussed here: https://github.com/voryx/Thruway/issues/71

barbushin commented 8 years ago

@mbonneau Thank you so much for sharing your thoughts about this question. Matt, could you please take a look at this comment http://stackoverflow.com/questions/35316766/wamp-protocol-scaling-out-routers-on-multiple-machines?noredirect=1#comment58544627_35316766 I very like idea that is used in Bitcoin daemons network, so you can build HA routers cluster with dynamic list of nodes and without any balancer. What do you think?

I think same idea could be used to scale services servers.

voryx / Thruway