My benchmark results - Githubissues

omid commented 3 years ago

I'm confused and wondering why are the results so different 🤔

I'm using the latest nightly, which is 1.54.0 and node 16.0.0. All tests are just for "Read". So I tested the same code in master on my local machine, which is: Desktop Linux, i7 3GHz, 8 core (8 thread), 16GB memory 2667MHz

SA

Requests      [total, rate, throughput]         1060260, 17670.99, 17670.37
Duration      [total, attack, wait]             1m0s, 1m0s, 2.106ms
Latencies     [min, mean, 50, 90, 95, 99, max]  334.431µs, 2.219ms, 2.072ms, 2.965ms, 3.393ms, 5.35ms, 48.375ms
Bytes In      [total, mean]                     202156240, 190.67
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:1060260

DR

LATEST THROUGHPUT INFO
Requests      [total, rate, throughput]         588250, 9804.16, 9803.81
Duration      [total, attack, wait]             1m0s, 1m0s, 2.135ms
Latencies     [min, mean, 50, 90, 95, 99, max]  373.671µs, 3.867ms, 3.614ms, 5.928ms, 6.807ms, 8.765ms, 31.832ms
Bytes In      [total, mean]                     112159696, 190.67
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:588250 

PEM

LATEST THROUGHPUT INFO
Requests      [total, rate, throughput]         650226, 10837.10, 10836.52
Duration      [total, attack, wait]             1m0s, 1m0s, 3.258ms
Latencies     [min, mean, 50, 90, 95, 99, max]  421.593µs, 3.662ms, 3.206ms, 6.278ms, 7.571ms, 10.887ms, 135.706ms
Bytes In      [total, mean]                     124626650, 191.67
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:650226

I ran it also on another machine, Laptop Linux, i9 2.4GHz, 8 core (16 thread), 32GB memory 3200MHz, and faced completely different result:

SA

LATEST THROUGHPUT INFO
Requests      [total, rate, throughput]         1101823, 18363.66, 18363.24
Duration      [total, attack, wait]             1m0s, 1m0s, 1.366ms
Latencies     [min, mean, 50, 90, 95, 99, max]  365.511µs, 2.101ms, 1.935ms, 2.995ms, 3.492ms, 4.689ms, 43.751ms
Bytes In      [total, mean]                     210080948, 190.67
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:1101823

DR

LATEST THROUGHPUT INFO
Requests      [total, rate, throughput]         218746, 3645.42, 3645.34
Duration      [total, attack, wait]             1m0s, 1m0s, 1.195ms
Latencies     [min, mean, 50, 90, 95, 99, max]  536.327µs, 10.683ms, 10.52ms, 20.746ms, 24.02ms, 29.712ms, 136.249ms
Bytes In      [total, mean]                     41707600, 190.67
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:218746

PEM

LATEST THROUGHPUT INFO
Requests      [total, rate, throughput]         656688, 10944.79, 10944.22
Duration      [total, attack, wait]             1m0s, 1m0s, 3.115ms
Latencies     [min, mean, 50, 90, 95, 99, max]  553.575µs, 3.541ms, 3.302ms, 5.437ms, 6.413ms, 9.094ms, 162.15ms
Bytes In      [total, mean]                     125865200, 191.67
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:656688

Now I have two questions: 1- Why is it completely different from what we have here in the blog post? 2- Why DR is so much different on these two machines?! (I ran them several times, it's almost the same)

pretzelhammer commented 3 years ago

Wow, those throughputs are amazing! Your test machines seem to have specs similar to mine, I'm also confused why the numbers you are getting are so much better. Did you make any code changes to the SA or DR code in any way?

1 - Why is it completely different from what we have here in the blog post?

The only noticeable difference between your setup and mine seems to be that you are running the benchmarks on a Linux machine (with PostgresQL in a docker container running Linux) and I'm running them on macOS (with PostgresQL in docker container running Linux). Docker has to do more work to simulate a Linux environment on a macOS machine than a Linux machine, of course... so it's possible I'm losing out on some performance that's getting lost in the virtualization layer... but even then, the difference in performance is so massive it's hard to believe...

That's my best guess, but even I'm not even happy with that guess. Hopefully someone else can chime in? It would be interesting to see more people run the benchmarks on their machines and share what results they get, maybe we can find some pattern in all this chaos.

2 - Why DR is so much different on these two machines?! (I ran them several times, it's almost the same)

No clue.

omid commented 3 years ago

This difference is the root of my confusion too! I did some changes and saw the difference and thought it's because of my changes, but nope! It's exactly what you have in the main branch.

The only noticeable difference between your setup and mine seems to be that you are running the benchmarks on a Linux machine...

Yep, but it should be the same, for all envs. So the same performance boost :/

weiznich commented 3 years ago

I believe I can explain at least some of that differences.

1- Why is it completely different from what we have here in the blog post?

This is likely caused by macOS having being slow at using many threads compared to linux. See here for details.

2 - Why DR is so much different on these two machines?! (I ran them several times, it's almost the same)

Did you use the default configuration provided by rocket, or the configuration provided by @pretzelhammer example? The likely cause here is that rocket is quite sensitive to the number of workers. Increasing this number should increase performance (at least till some point). Now it seems that your second test machine has more logical cores available + the optimal number of worker threads is depended on the core count. This suggests that you need to increase that number of worker threads even further to get comparable performance. I would try something like doubling the number to 512.

omid commented 3 years ago

Thanks @weiznich the first answer looks logical 🙏🏼

About the second one, I tried with 512, saw a worse result, like 180K. Set it to 32 (logical CPU * 2, based on their documentations) saw a similar result as 218K.

pretzelhammer commented 3 years ago

I've updated the article to use a DigitalOcean linux VPS as my benchmark test environment since that's much more realistic environment than a macbook pro. The benchmarks should be more accurate to real-world use-cases, and switching to a linux environment did significantly improve the performance of the DR server. Thanks for your help guys! I'm gonna close this issue.

pretzelhammer / rust-blog

My benchmark results #37