swoole / swoole-src

🚀 Coroutine-based concurrency library for PHP
https://www.swoole.com
Apache License 2.0
18.48k stars 3.16k forks source link

Performance comparison #1401

Closed ghost closed 3 years ago

ghost commented 7 years ago

Could you provide some performance comparisons with libraries like ReactPHP and Amp?

Wulfklaue commented 6 years ago

A simple "hello world", on a 12 thread Ryzen

Note: Apache2 is a monster eating 85 a 95% CPU performance and massive memory load.

In ranking CPU usage:

Memory usage between all 3 Crystal, PHP+Swoole or Go is acceptable. In a 10 a 30MB range during the test run. Apache2 horrible 300 a 500MB. The longer the test, the worse Apache2 grew.

IO locking is about 40/60 ( 40% real CPU usage, 60% IO locking ). From what I can see as red in htop.

That is all my tests so far.

==== Excuse me, I want to show something here: it's the latest tfb benchmark test(2018-8), swoole is the top1 in the test with mysql (not postgre) https://www.techempower.com/benchmarks/#section=test&runid=9d5522a6-2917-467a-9d7a-8c0f6a8ed790

image

gouchaoer commented 6 years ago

hello-world benchmark is useless. real app all have sql/redis/rpc/http io, have difficult logic that need a framework.

gouchaoer commented 6 years ago

why not run a more io heavy and complicated benchmark?

ghost commented 6 years ago

This was great test to see the possible throughput, thanks!! Did you make sure the threads were correctly set for all use cases? Go will utilize all of available threads if you don't disable it via GOMAXPROCS=1.

Also what is Crystal and could you also try Amp+Aerys and ReactPHP?

Wulfklaue commented 6 years ago

@gouchaoer Hello world apps do have a use. It can show you where bottlenecks are on a language. When Apache+PHP eats away twice the cpu power for half the resulting speed, you know there are bottlenecks.

IO or heavy complicated tests will smooth out the results because you end up IO blocked more then not. When you test database or other system, you are testing those databases limits, the netwerk driver etc ..

Swoole only bypasses some PHP limits, its like inserting a ngenix webserver into your PHP. Both are C based and offer great performance but the moment you start testing PHP pure function, you are not testing Swoole / Apache / ... anymore but the language limits. As such the comparisons becomes unfair.

Go and Crystal will kick PHP in the behind ( unless PHP calls a pure C based function library ) in regards to memory usage etc. So how fair is that comparison then?

I did more tests that involved more complicated PHP functionality and PHP fell on its face compared to Crystal and Go. Unless i limited the test to pure single function calls ( that directly call the C code ).

@ivanjaros

Go used all threads, as you can see from the CPU usage ( where it was getting more system blocked, as with the other languages ).

If you want real speed, i simply advice going for Go or Crystal. If you are looking for a simple webserver setup with the easy of PHP code ( if your used to it ) you can use Swoole.

Note: Do NOT run Swoole in production. I ran into several issues ( crashes ) during my extended tests that made me dump PHP / Swoole.

savire commented 6 years ago

@Wulfklaue, can you please elaborates on what crashes you've encountered during your tests in production?

I'm pretty interested in this project. Sadly looks like the owner of the project does not monitors this kind of inquiry, which is supposed to be well adressed first if they wants their work to be acknowledged since no good developers wants crashes on production for their clients.

twose commented 6 years ago

@savire @Wulfklaue Swoole is "Production-Grade Async programming Framework for PHP" now. Quite a few Chinese companies use it in the production env, such as Tencent, Baidu, Huya... Swoft, a Swoole coroutine framework has also released 1.0 version. Swoole 1.x version is more stable, in the meanwhile coroutine is future. every developer can choose what they prefer to. Author of Swoole is also working on pushing forward its international, maybe a list includes many of product projects which depend on swoole is coming soon, we can wait with hope.

savire commented 6 years ago

@twose, yeah I notice what they promotes for. Honestly I'm one of those peoples who use heavy customized version so I rarely use provided framework as is. I have also tried swoole sometimes ago for some testings. That is why if there are others who founds issues related to crashes that is serious one and I would love to know if it has some solution since if they claim its production ready and there are chances of crashes it wont be good if its not taken care of.

twose commented 6 years ago

@savire you can notice that Core PHP developer dstogov has contributed Swoole repository recently, most of the issues which are known now would be solved. I think it is worth waiting for.

Wulfklaue commented 6 years ago

@savire It has been a long time ( do not even remember correctly the issues ) and this was with the older version. The new tests i did with version 2.0.x was stable in my case. Need to check the version number but its technically the latest.

I did some more tests with the newer version.

Note these are on limited virtual servers. The main has 2 CPU cores, and two others with one core. All on 2Ghz so not exactly fast setups. This was to test out several languages with cockroachDB in a cluster. Simple fetch some data, loop over it and that is it. So no "hello world" but some more real world behavior:

Golang: Requests per second: 1475.43 [#/sec] (mean)

PHP + Nginx: Requests per second: 32.63 [#/sec] (mean)

PHP + Nginx + Pconnect!: Requests per second: 36.18 [#/sec] (mean)

PHP + Swoole Requests per second: 1036.04 [#/sec] (mean)

PHP + Swoole + Pconnect! Requests per second: 1571.12 [#/sec] (mean)

Yes, PHP results are horrible. All others use 100% cpu usage on both cores ( from the main servers ). PHP uses at best 20% and gets stuck waiting to reconnect to CockrochDB. No matter the tweaking on PHP or Nginx was able to solve this.

Making a initial connection on CockrochDB is slow. The moment you have this connection, your bypassing that initialization. Swoole or Golang can create a single connection and keep this open as its not part of the coroutine. Aka connection is made one time and reused for each API call.

PHP because its building up a connection on each request, is just destroyed in this test. Its a know issue that not even Pconnect can solve because each request comes from a different thread/event. Something that Pconnect has issues with.

So not a fair test but it shows the issues that people can run into running production code in combination with CockrochDB. PostgreSQL has less of a issues because it has a much faster handshake / connection buildup. But even then PHP is forced to reconnect all the time where as Go, Swoole can reuse that same connection.

Note: this bench fetches data mostly on the Node 1 ( dual cpu + webserver swool/go/php). Other tests where the data is fragmented on several clusters:

Crystal Requests per second: 373.17 [#/sec] (mean)

Go: Requests per second: 547.83 [#/sec] (mean)

PHP + Nginx: Requests per second: 32.40 [#/sec] (mean)

As we see PHP simply is still dead in the water even when data is being fetched from other servers in the cluster. The whole connecting bottleneck. No Swoole tests on that one, i was mostly testing Crystal on this as Swoole and Go its performance is plenty close.

Expect those result to be multiply times better when run in a more realistic server and not on a dual / single 2GHz core setup :)

Its a little bit off-topic but its worth mentioning. Do i trust Swoole on a real production servers... Still not 100% sure. While i see Tencent has a repository with Swoole, it does not strike me as a production repository.

https://github.com/Tencent/tsf

Tencent Server Framework is a coroutine and Swoole based server framework for fast server deployment which developed by Tencent engineers.

When i read this, it almost feels like they are describing a testing environment.

The issue is its hard to get any real information as to what and where Swoole is really deployed.

And if a am honest ... i do miss the easiness to deploy a "secure" single binary on the server that Go and Crystal provides. Its more hack proof then having your pure source files on the server where even script kiddies can make alternation in. Compiled Go / Crystal binaries make it more difficult and requires a higher class of hacker to start doing binary changing or memory manipulation. Lets say i have done too much maintenance and cleaning out the junk on Wordpress websites to really hate the pure source code on a webserver issue that PHP and others provide.

This is in my personal opinion a big weakness to PHP. PHP needs to evolve to include coroutines, some sort of single file compiled binary deployment ( not phar ) ... And from what i see PHP is simply not changing with the times. Boy is this off-topic.

cbone99 commented 6 years ago

@Wulfklaue, ah I see. Yeah they are progressing. Thanks for the confirmation and all those tests data.

You are correct when its comes to real world apps it will be tested against many bottlenecks. Its all mainly comes with IO, context switching, etc. Those probably comes with the middle wares too. So unless Swoole provides custom solution like they did with MySQL then it might be for naught. I do see a Postgre module being asked here.

I haven't test with connecting via their provided external lib since as you've said its not fair since there are stuff involved in between. So my test sometimes ago just involving custom template handler which done via loading coded template and processing them. In my case with default NGINX and a single core vps its producing around 2k reqs/sec. I haven't monitored the CPU usage though while testing. Not sure why but I had no luck with enabling OpCache too. Its all enabled for CLI mode but seems like its not used somehow by Swoole.

Yes I personally agree on those binary deploy option since most modern deployment are following those scheme now. Especially if we are focusing on docker and the like. I do expect PHP can follow the pattern too soon. Well Swoole did provides the same functionality although its still using readable codes. This also comes with a question for me did Swoole loads all included, required PHP files into memory or it just use it when needed a.k.a load it again since if so that will surely got some impact even with OpCache enabled.

Honestly what I like from Swoole is they support multi protocols on single port like http and wss at the same port. Well in production we still probably need to put NGINX in front of them though but its pretty interesting to write them in a single source.

I did tests some tasks module they had and found some issues too but well it can be handled with some care. Also I'm kinda language agnostic when its come to my work so I will use anything that fit into the project needs. So I personally loves to see where Swoole going.

If they indeed claims they are backed by those Big companies behind them and it follows, I think somehow seeing how they are able to put a fresh perspective into PHP like this they might be able to produce some alternative way to produce those single binary deployment option if needed because I do believe that somehow they actually can just bypass the step of parsing the script and get those PHP generated bytecode to run instead. Maybe something like WASM did with a JS loader to load the assembly code. haha. So user can just produce the compiled version and use a normal php script loader instead. I know they can.

@twose, yeah I do expect more stuff happens here on Swoole. Its time for PHP to catch up with other languages. Just stop those syntactic sugar development and focus on performance enhancement.

embluk commented 6 years ago

Does anyone have any up to date tests against PHP/Swoole and Node.js? Maybe just on the standard HTTP server?

doubaokun commented 6 years ago

@Wulfklaue Thanks for the detailed testing report.

dkraczkowski commented 6 years ago

@embluk I have tested my microframework which supports swoole, and results are pretty impressive. screen shot 2018-03-19 at 11 17 44

Please note that: Slim, Zend Expressive, Lumen are running standard php-fpm+nginx setup, while Igni is running purely on swoole. Test was perform with ab test for 100 concurrent request.

Wulfklaue commented 6 years ago

@cbone99:

This also comes with a question for me did Swoole loads all included, required PHP files into memory or it just use it when needed a.k.a load it again since if so that will surely got some impact even with OpCache enabled.

Swoole is a bit of a strange duck. If you include php files, it loads them each time. If you do include_once, it loads the first time and never runs the code after this again.

So be careful benchmarking code, because you may think benchmark result 1 = 2 = 3 ... when in reality your benchmarking 1 and 2/3/4 without the include_once code being executed.

If you establish a database connection in your include file, it will reuse that exact same connection, resulting in a massive speedup compared to native PHP. Even without a "permanent" connect being used ( aka see the results above with the 1000 vs 1500 requests ).

Swool is tricky to test because it breaks so many things that you are used too from PHP.

Opcache:

Now i do know for a fact that opcache works in CLI mode, as its clearly showing the PHP files being cached in my setup and the hit counter keeps going up correctly.

NGINX and a single core vps its producing around 2k reqs/sec.

Ps: My 32/1000/1500 reqs/sec was a realist benchmark done to a remote server setup. So it also had all the protocol overhead etc.

2k req/s is not a bad result on a single core with nginx + php + benching software ( that alone with Apache bench eats easily 15% of je CPU time ). And add to this the issue of CPU cache misses with the constant content switching.

I do believe that somehow they actually can just bypass the step of parsing the script and get those PHP generated bytecode to run instead.

I love to see that but i feel that Swoole is already a huge project with only a few developers. And this is also one reason it scares me to use it in production. You never know in the future if the project keeps in active developed or dies out like the dozen before it.

Whatever the result, if PHP does not step up fast to deal with the changing market, i feel that PHP its future is not going to look as positive anymore. More and more tech and focus is going into clustering and pay per usage services. And PHP by default does badly on CPU and Memory usage compared to some of the newcomers ( if you compare equal performance vs cpu/memory usage).

@eaglewu

PHP/Swoole and Node.js ... I do not have the numbers here but on a 6 core AMD Ryzen i hit 13k with Node ( but it was stuck using one core ). Trying to force it using all core, resulting in one core being 100% taxed but the rest not as good. And Node was hitting in the 32k range. The issue being that Node uses one core to synchronize the rest and when that core is 100% taxed, the rest suffers. This is a know limit of NodeJS.

I really need to go back to my memory on that one but Swoole was defiantly hitting above 100k results with more cores being used efficiently. But i do not remember the correct result. Was it 130 or 140k or ... that test has been done too long ago ( several months ). So sorry about that.

I can tell for sure that i found the results for NodeJS dissapointing ( while massive better then raw PHP ) but compared to Go, Crystal and Swoole.

Also the memory usage was ridiculous compared to the rest. Yes, even PHP+Swoole ( if run without leaking code ) stayed withing 2MB max per PHP process / thread. Where as NodeJS grew bigger ( what was kind of ironic given how PHP its memory consumption in the past... pre 7.0 was a issue ). But those are simple tests, the proof is in the pudding when running complex tests.

IljaN commented 6 years ago

Does it run on hhvm?

eechen commented 6 years ago

@IljaN HHVM does not support Swoole this pecl. Swoole can run in PHP5 and PHP7. I recommend to run Swoole with PHP7, because Swoole's coroutine(like Go's goroutine/channel/select) only support PHP7.

cbone99 commented 6 years ago

Just curious, since they suggest to use NGINX as reverse proxy which will slow down the performance especially since they just state their example using simple TCP. Tested this, the result is far worse with default NGINX configuration they provided against their raw custom http server accessed directly.

I know that they might not focusing their effort on their custom http but actually if they don't why they provides some advance configuration on it? Like running on custom user, http2 support along with ssl, etc. Its just feels weird to suggest on using NGINX if they wants to claim that their work is production ready.

On the other hand if they aren't gonna tells us to use their custom http server for production then do they perhaps have better way to glue their custom http server to NGINX, perhaps using direct UNIX Domain Socket? Isn't will perform way better than just TCP especially since it will probably will be run locally?

yellow1912 commented 6 years ago

This project is definitely interesting, there is just so little documents and activities here. Like others mentioned, it will be much more interesting to see how we can use this one for some real world example such as handling a WordPress or WordPress like site. The thing about nginx and phpfmp is that it's a proven solution that can runs in real production environment. It's difficult to consider swoole at this stage for any mission critical project.

Wulfklaue commented 6 years ago

@yellow1912

Swoole can not run a unmodified wordpress. You need to use the non-blocking Mysql drivers that Swoole provides. Add to this that some functions in PHP like globals and specific error reporting can bloat your memory usage.

Swoole is more for people who start on fresh projects and can take in account the limitations that Swoole has with keeping PHP bootstrapped in memory. On the other hand ... you get performance in exchange. Just yesterday doing some tests again with Wordpress vs Swoole ( custom framework to simulare a normal system with json, updates, inserts, and selects ). I was getting 300 req/s where as Wordpress was 22 req/s for its front page without any modification.

If you want to see the performance:

https://www.techempower.com/benchmarks/#section=test&runid=f62c00e2-070f-4636-90a3-1ba2687271a4&hw=ph&test=json

  1. php-swoole 1,157,868
  2. amp 284,272
  3. php-php5 246,331

This is with a simple json encoding... The results in query / fortune tests are close to PHP because i used to blocking PHP mysql drivers ( simple took over the PHP5 test script and ran them with Swoole ).

A new version has been tested with the coroutines but they fail hard on some issue. Same with the plain text. Now its just a matter of figuring out where these issues are coming from ( fairly sure it has to do with a error during the compiling ).

I found the coroutine drivers always lacking ( several times issues with them ) and i suspect that matyhtf his work by taking over wechat there coroutine code (alpha 4), is part of the solution, as wechat there code is battle tested like hell ( 100 of millions of users ).

We shall see ... like with all project it takes time to work out the kinks. Nginx and phpfpm also took time to mature.

Documentation:

Yes, this is lacking. I found it better to simply look at the examples:

https://github.com/swoole/swoole-src/tree/master/examples

Activities

See the commits... Plenty of activity.

embluk commented 6 years ago

@Wulfklaue Wait WeChat runs using Swoole? Or is matyhtf just using code from WeChat?

Wulfklaue commented 6 years ago

@embluk

https://github.com/Tencent/libco

The 4.0 alpha is now using Tencent ( WeChat ) their Libco code.

embluk commented 6 years ago

@Wulfklaue

Ahhh right, fair enough. Did the Swoole team try and create their own coroutine code and then it did not work out and then switched to libco?

Wulfklaue commented 6 years ago

@embluk

Not sure myself... I have seen this activity splitting the code base into two version the last few weeks. The developer are Chinese and so communication is not very public / easy to find :)

I assume that are integrating libco to solve issues and or test the difference between both implementations.

I know that WeChat has massive amount of users ( 963 million monthly active users in 2017 ), so it standard to reason their code is much more optimized and bug tested, then the implementation that matyhtf wrote...

matyhtf commented 6 years ago

@embluk Libco is used to quickly verify the feasibility. In the next version we will replace libco

ghost commented 6 years ago

I left PHP 10 years ago - because of maintaining all needs. I switched to node, but, than i wrote my servers complete in C. I changed to Java, then Golang, then Crystal. Native compiling is a nice feature - and if needed, you know, why - but, if i had noticed swoole already 2015 i wouldn't have use Golang or Crystal. The power of swoole and easyness to maintain php setup now is unbelievable. I did not understand the real need of a binary to protect against script kiddies. No such kind of people should have direct access to this server.

Compared to those other langs - swoole does a perfect job! Even if coding in Crystal is done very easy, it's 10 times faster now with swoole by comparable performance to the big system languages.

Unbelievable, but true: PHP is back! :)

ghost commented 6 years ago

What i found really interesting about benchmarking - while doing those benchmarks, you will see, the bigger the machine, the better performs swoole. For example, testing swoole against node on limited cpu, swoole is just a little bit faster than node cluster. But with more cpu, swoole performs more as double as fast as clustered node - in simple http hello world as on file IO too.

And, is to say, PHP has a lot of optimized code (like all long-living languages), so, there are many algorythms performanter in PHP as in V8 optimized JavaScript, so, benchmarking several code snippets shows PHP as really fast engine, which is now horrible fast and easy thanks to swoole :1st_place_medal: :)

ghost commented 6 years ago

Btw: Techempower Round 17 :muscle:

seeme-o commented 5 years ago

One thing swoole is proving is how PHP is able to perform better when a different approach is taken. True that Node.Js/Go/etc may be able to do x, y, or z faster - but what if the relevant php extensions and some of PHP/Zend core changed direction as well?

seeme-o commented 5 years ago

@flddr there's a funny thing about platforms that claim the ability to compile to native instructions - only some code is actual able to do it. 1 + 1 can easily be broken down into native code - all cpu's are capable of understanding 8/16/32 or even 64 bit ints and all cpu's have an addition instruction. However no cpu has a native String type (although chars are represented as numbers) and no cpu has a String.split instruction. These are all provided through higher level liberaries. While 1 + 1 can be broken down to raw cpu instructions (which itself provides a lot of magic required for mannipulating char strings) most notable functionality of systems like node.js come from pre-compiled libraries and most raw JS cannot literally be converted directly to raw system instructions. Does your cpu have an instruction set for regular expressions etc?

seeme-o commented 5 years ago

Let me make this clear, just because of the debate that I can see following: While it is possible to break a method like 'string.reverse' into native code it is hardly different than such a method existing in pre-compiled code. Hypothetically, if an entire routine is broken down into native instructions in the same memory space it will always run faster then having to call a function in a shared library, but at this point your talking about difference of a few cpu clicks - maybe a few nano seconds. This difference is real, and in some cases it can make a difference - for example a nano second isn't small when dealing with billions of bytes or an algorithm that may try to compute a set of numbers in many million different ways. But in reality Node.Js (and others) are close to useless without their many modules provided through shared libraries, and at this point it comes down to the glue that bonds it all together. This is important when it comes to the cry's of more in-depth benchmarks - will such benchmarks include a suite highly repetitive yet useless in practice computations or will it include many millions of calls into widely used modules for the given system?

sinasalek commented 5 years ago

When i first benchmarked swoole myself, i couldn't believe the result! i though something might be wrong. but nothing was wrong and PHP was really faster than nodejs and even nginx!!. considering the fact that swoole also support SSL, there is no need to use any webserver at all! even for serving static files

https://gist.github.com/nkt/e49289321c744155484c#gistcomment-2265226

embluk commented 5 years ago

@sinasalek Yes, it is amazing what Swoole is doing, it is really changing up the PHP world and where it can be used :)

kenashkov commented 4 years ago

Hi everyone, I did some tests my self few months ago. My tests did not concern only a "hello world" but something closer to real world app like connections and caching and file operations. The results and source are here: https://github.com/kenashkov/swoole-performance-tests

Only Apache/mod_php and Swoole 4.4 are compared, sorry no Nginx.

fakharak commented 3 years ago

hello-world benchmark is useless. real app all have sql/redis/rpc/http io, have difficult logic that need a framework.

A "hello world" benchmark provides performance of purely what you want to benchmark excluding (impurities of) I/O with external resources. What that means ? The intention here is to provide performance of the technology (Swoole in this case) at its core, not a Business Application programmed using this technology. When we combine technologies, we can not tell exactly what caused the performance degradation; the technology itself ? I/O Layer ? the implementation of the other software engine ? Inefficient algo of the developer who implemented integration-logic ? or, (Database) Drivers used for interaction ?