vibe-d / vibe.d

Official vibe.d development
MIT License
1.15k stars 284 forks source link

Problem with massive parallel connections #1749

Open linux-support opened 7 years ago

linux-support commented 7 years ago

Hello, my aim is to replace existing applications running with 60k++ concurrent connection per process an d-lang and vibe.d.

So I did start testing with the http_server example (vibe.d v0.8.0-beta.5) by adding std.core.core.sleep(2000.msecs) to the connection handler and opening 12k concurrent connections. I wanted to know how vibe.d is perfoming when handling long-running and inactive connections.

Quite often I get an time error and 'ab' is stopping to work. After perfomring some Tests I did reduce the connection count to 2k and 4k concurrent requests. But there are still timeout issues. E.g.:

$ ab -n12000 -c4000 http://localhost:8080/ This is ApacheBench, Version 2.3 <$Revision: 1706008 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient) Completed 1200 requests Completed 2400 requests Completed 3600 requests Completed 4800 requests Completed 6000 requests Completed 7200 requests Completed 8400 requests Completed 9600 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 10297 requests completed

Is this a problem of vibe.d v0.8.0-beta.5? Is this problem reproducable on other Linux systems? (I did utilize Linux 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux)

cu, Mario

s-ludwig commented 7 years ago

Based on other observations, it appears to be an issue with the 0.8.0 betas indeed. The most likely culprit is garbage collector pressure, which accidentally increased considerably due to the switch to std.experimental.allocator. I'll have a look at this in the coming days and should then hopefully have a solution or a more certain answer w.r.t. the actual cause.

linux-support commented 7 years ago

I did some further investigation... With an increasing number of concurrent connections the utilized RAM is raising. When the load testing tool has been stopped the allocated memory resources are not freed. I will do some additional checks a non-beta release.

linux-support commented 7 years ago

Same result with 0.7.31. But I have installed 'DMD64 D Compiler v2.074.0'. As far as I know this is not a tested/supported compiler.

s-ludwig commented 7 years ago

There are some changes/fixes in 0.8.0-beta.6 when using the "vibe-core" sub configuration of "vibe-d:core":

ade90036 commented 7 years ago

Hi,

I'm interested in using vibe.d in some of my micro-services. I have just ported a "service" i have running in prodution (JVM 1.8 netty NIO) to D. I have created a very rudimental benchmark between the two performance is disappointing.

-------------- Netty; version: 4.1.5 Concurrency Level: 50 Time taken for tests: 71.433000 seconds Complete requests: 5000000 Failed requests: 0 Write errors: 0 Kept alive: 5000000 Total transferred: 10240000000 bytes Requests per second: 69,995.66 [#/sec] (mean) Time per request: 0.714 [ms] (mean) Time per request: 0.014 [ms] (mean, across all concurrent requests) Transfer rate: 143,351.11 [Kbytes/sec] received -1 kb/s sent 143,351.11 kb/s total

-------------- vibe.d/0.7.32 Concurrency Level: 50 Time taken for tests: 132.890000 seconds Complete requests: 5000000 Failed requests: 0 Write errors: 0 Kept alive: 5000000 Total transferred: 65000000 bytes Requests per second: 37,625.10 [#/sec] (mean) Time per request: 1.329 [ms] (mean) Time per request: 0.027 [ms] (mean, across all concurrent requests) Transfer rate: 489.13 [Kbytes/sec] received -1 kb/s sent 489.13 kb/s total

To be more precise i did build the D binary with --build=release and --arch=x86_64. When i was running the test the netty application was utilising ALL the CPU, while the D, only one core.

I would like to test the parallel processing fix @s-ludwig mentioned above. Should i simply update the vibe.d version to 0.8.1 beta? Is there any other specific configuration i need to make the standard http_server example to try out this feature?

It think vibe.d had come a long way and i feel it is quite mature, however i would hate to invest my time in re-writing some of my applications to find out that it doesn't perform well as other application and it doesn't take advantage of a multicore architecture.

BTW i'm using 0.7.32 on DMD on mac.

regards

ade

PetarKirov commented 7 years ago

Hi @ade90036, thanks for posting those benchmark results. I believe that it's import to constantly evaluate the performance of competitors when developing a library.

Yes, be sure to compare with 0.8.1 and master (though I don't know if there would be much difference between 0.8.1 and master). In general, when posting performance comparisons be sure to also mention:

On the last two points - vibe.d has a couple of I/O backends: libevent, libasync, win32, winrt and vibe.d's own vibe-core. And as you can imagine each has different characteristics. AFAIK, currently vibe.d uses libevent by default. So if you can, it would be nice if you could compare libevent to vibe-core and perhaps libasync. Including the dub package file (dub.json / dub.sdl) makes it clear which backend was used.

About the compiler, as you probably know, there are three D compilers (DMD, LDC and GDC) and all of which sharing DMD's frontend, but each with distinct backend. The first two are being preferred for now as GDC uses a bit older version of DMD frontend, though I believe they will catch up in a couple of months. The advantage of DMD is that it comes with the latest of everything (frontend, druntime and phobos) and it offers faster compile-times. LDC on the other hand uses older versions of DMD's frontend (LDC 1.5.0 is based off DMD 2.075.1, while the latest stable DMD is 2.076.1 and the latest 2.077.0-rc1), however by the virtue of its LLVM backend offers support for more targets and superior optimization capabilities. About the last point, you may be interested in Jon Degenhardt's performance comparison of his data processing utilities compiled with different LDC versions in combination no/thin and full LTO (link-time optimization): https://github.com/ldc-developers/ldc/issues/2380. For his use-case using LTO makes a noticeable difference in terms of performance and more prominently in the executable size.

See also: http://forum.dlang.org/post/oq03g5$1uud$1@digitalmars.com

ade90036 commented 7 years ago

Hi @ZombineDev thanks for your prompt and detail response.

Yes, i admit i didn't provide all the performance comparison information upfront.

I wasn't aware of different event-loop libraries / implementation that vibe.d can use. I read the main documentation and i saw mentioned that you can use different libraries but the howto, i couldn't find it anywhere within reason.

Thanks for the link it is very helpfull, however i would like to point that this knowledge is sparse and it is certainly penalising the perception of vibe.d framework.

Vibe.d claim is: "Powerful asynchronous I/O and web toolkit for D, providing a fiber based blocking programming model and an efficient API", but if anyone uses the "hello-world" example as a basis of a benchmark he/she will immediately see that it is certainly NOT.

It is one thing to say that is flexible and easy-to-use, where this can only be evaluated over a long period of time, but if you say that is asynchronous / fast, it can be evaluated immediately. If one of your statement doesn't live to expectation certainly, it is very hard to convince one the rest of the statement is true.

Surely in nowadays age, a web framework should be automatically take advantage of the full CPU/Cores of the machine? Why is the standard configuration (hello-world) with libevent appearing to run in a single-threaded mode?

This could also explains why all the vibe.d benchmarks i have seen perform very very poorly. The one that i'm referring to is this one: https://www.techempower.com/benchmarks/ Which shows vibe.d to be slower than PHP5 Slim, Python Django which are notorious to be horrendously slow.

You seriously need to fix this.

I will re-test different configuration and report back.

Regards

Ade

ade90036 commented 7 years ago

I have re-run few tests, java is still million miles ahead!!!

What i have observed is that java process is utilising ALL CPU (multicore) and the utilisation reaches 400% with the cooling fan running at full speed (like a gas guzzling v8), while the vibe.d is only utilising 99% and the fan is running in silent mode. Maybe this should food for thoughts.

macOS 10.12.6 Intel I7 2.3Gz 16GB

DMD64 D Compiler v2.076.1 Vibe.d 0.8.2-alpha.2

Java JDK 1.8u121 Netty 4.1.5

results: Java Netty NIO: 61,505 req/s (mean) vibe.d (libevent): 33,498 req/s (mean) vibe.d (libasync): 30,290 req/s (mean)

dub.js (only thing i have changed is the libevent --> libasync

{
    "name": "vibehelloworldjson",
    "authors": [
        "ade90036"
    ],
    "dependencies": {
        "vibe-d": "0.8.2-alpha.2"
    },
    "description": "A simple vibe.d server application.",
    "copyright": "Copyright © 2017, ade90036",
    "license": "proprietary",
    "subConfigurations": {

        "vibe-d": "libevent"
    },
    "buildTypes": {
        "release": {
            "buildOptions": ["releaseMode", "optimize", "inline"]
        }
    }
}

Code borrowed from here: [http://forum.dlang.org/thread/fldcbcchjwybakdocmmn@forum.dlang.org?page=1]

import vibe.vibe;

void main()
{
    logInfo("Ready to listen to post 8080");
    setupWorkerThreads(16);
    runWorkerTaskDist(&runServer);
    runApplication();
}

void runServer()
{
    auto settings = new HTTPServerSettings;
    settings.options |= HTTPServerOption.reusePort;
    settings.port = 8080;
    settings.bindAddresses = ["127.0.0.1"];
    listenHTTP(settings, &handleRequest);
}

void handleRequest(HTTPServerRequest req,
                    HTTPServerResponse res)
{
res.writeBody("Hello, World!");
}

Although I believe it is wrong.. it is spawning 16 worker threads listening to the same port, which is throwing a warning on mac. You cannot bind to the same port.

Listening for requests on http://127.0.0.1:8080/
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Failed to listen on 127.0.0.1:8080
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.
Task terminated with uncaught exception: Failed to listen for incoming HTTP connections on any of the supplied interfaces.

the command i'm using to run the application is the following:

dub run --build=release

I suspect that there is something majorly wrong. How do achieve higher cpu utilisation? I could spawn multiple separated processes and have an nginx sitting on front for load-balanging, but why do i need this? Should the application already utilise the full CPU? What it appears the application is doing is running on a single thread where the CPU utilisation never goes above 99%.

Any ideas or suggestions?

Regards

Ade

s-ludwig commented 7 years ago

It seems like the reusePort property isn't handled correctly on OS X and thus only a single core is being used. I'll have a look at it.

s-ludwig commented 7 years ago

Okay, reusePort now works on OS X, too. I've also tagged a new alpha release (0.8.2-alpha.3), can you re-run with a dub upgrade --prerelease?

s-ludwig commented 7 years ago

BTW, increasing the number of threads above the number of logical cores (sometimes even physical cores) will usually degrade the performance somewhat. It can be useful to hide latencies introduced by tasks that occupy the thread longer than they should, but other than that staying with the default (i.e. simply not calling setupWorkerThreads) is usually the best bet.

ade90036 commented 7 years ago

I have pulled the latest changes (0.8.2-alpha.3) re-ran the test with the reusePort and the removed setupWorkerThreads, but the performance is the same.

vibe.d(libevent): 35,665.50 [#/sec] (mean)

I did this notice a poor CPU utilisation, which at the time of my tests is comparable with the CPU utilisation of NodeJS test (single thread).

screen shot 2017-11-07 at 12 48 05

N.B. the java process in the task is the one that is running the benchmark and collecting results.

This is the CPU utilisation that i see when i run java.

screen shot 2017-11-07 at 12 58 15

N.B. the second java process with 200%+ is the netty server, the other one is the benchmark program which run the request and collect the results.

I have also expanded the test by adding other web frameworks such as go and rust counterparts. The CPU utilisation of the following framework is maxing out, in line with java and this translates in much higher Req/sec.

rust rocket; version: 0.3.3 Requests per second: 101,308.87 [#/sec] (mean)

rust nickel; version: 0.10.0 Requests per second: 112,846.44 [#/sec] (mean)

fasthttpGo; version: 1.9.2 Requests per second: 74,392.58 [#/sec] (mean)

N.B. Im using the same machine as above and same test conditions and same benchmark program.

s-ludwig commented 6 years ago

Do you see multiple "Listening for requests on http://127.0.0.1:8080/" messages now, or are there still any "Failed to listen on 127.0.0.1:8080" ones? I'll attempt to run an own benchmark on macOS to see if I get the same symptoms.

s-ludwig commented 6 years ago

Okay, I can confirm the issue. Although I have a number of VMs running in parallel, it's clear that the performance is degrading in multi-threaded mode. There must be some kind of lock contention happening. This happens with the libevent driver as well as with vibe-core.

ade90036 commented 6 years ago

@s-ludwig, yes the program starts and it logs successfully:

Listening for requests on http://127.0.0.1:8080/
Listening for requests on http://127.0.0.1:8080/
Listening for requests on http://127.0.0.1:8080/
Listening for requests on http://127.0.0.1:8080/

N.B. Totalling the number of logical cores available.

One more thing i would like to point out is that the CPU usage i observer when running vibe.d with libevent vs libasync was similar if not identical.

Therefore, it seems that all libraries suffer from same behaviour under the MacOS operating system.

I will run the same tests on my linux box and windows box, just to confirm / deny that there is also CPU utilisation bottleneck in other os.

I will report results of java against vibe

Regards

carun commented 6 years ago

With 0.8.2, I find that even with multiple worker threads, only one thread is always active. This could be the reason why vibe.d performance poor.

import core.thread;
import vibe.d;
import std.experimental.all;

auto reg = ctRegex!"^/greeting/([a-z]+)$";

void main()
{
    writefln("Master %d is running", getpid());
    setupWorkerThreads(logicalProcessorCount + 1);
    runWorkerTaskDist(&runServer);
    runApplication();
}

void runServer()
{
    auto settings = new HTTPServerSettings;
    settings.options |= HTTPServerOption.reusePort;
    settings.port = 8080;
    settings.bindAddresses = ["127.0.0.1"];
    listenHTTP(settings, &handleRequest);
}

void handleRequest(HTTPServerRequest req,
                    HTTPServerResponse res)
{
    writeln("My Thread Id: ", to!string(thisThreadID));
    // simulate long runnig task
    Thread.sleep(dur!("seconds")(3));

    if (req.path == "/") 
        res.writeBody("Hello, World! from " ~ to!string(thisThreadID), "text/plain");
    else if (auto m = matchFirst(req.path, reg))
        res.writeBody("Hello, " ~ m[1] ~ " from " ~ to!string(thisThreadID), "text/plain");
}

In my case, the code always printed

S C:\Users\Arun\Code\Personal\d\vibe-d-httpserver> .\vibe-d-httpserver.exe
Master 13396 is running
[vibe-3(h2ZR) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-6(rG0U) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-2(oqKZ) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-5(6XSK) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-8(md7h) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-1(Qmq1) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-7(APrD) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-4(6f6J) INF] Listening for requests on http://127.0.0.1:8080/
[vibe-0(yFIW) INF] Listening for requests on http://127.0.0.1:8080/
My Thread Id: 17100
My Thread Id: 17100

The same is the output on the browser as well.

Any pointers?

dan0mau commented 6 years ago

@carun Not sure if keepalive might not be messing up your thread results, The default keepalive for vibe.d is 10 seconds, A browser will try to use keepalive and that will try to use the same connection (which I think may end up using the same thread?), if you use separate curl commands, do you get the same results?

carun commented 6 years ago

@dan0mau It appears that chrome multiplexes the connection. siege works fine though. See https://forum.dlang.org/post/egfqfjvaayeschgkcpwz@forum.dlang.org

vitalka200 commented 6 years ago

@s-ludwig Hi, Is there any progress with this?

Imperatorn commented 4 years ago

Solved?