satori-com / mzbench

MZ Benchmarking
BSD 3-Clause "New" or "Revised" License
271 stars 78 forks source link

Connections limit to 28k CCUs #162

Open AhmarRG opened 4 years ago

AhmarRG commented 4 years ago

I'm trying to connect 30k mqtt CCUs but the connections limit to 28225.

parsifal-47 commented 4 years ago

Hi Ahmar, is it about vernemq? depending on which error you get it may be specific to it, but it could also be your system limitation. In the second case, it could be fixed with ulimit.

Thanks!

AhmarRG commented 4 years ago

@parsifal-47 Yes, its with vernemq. The system limits are all set to higher values and my JMeter scripts can work across 1 million but the vmq tests limit to 28225 CCUs.

Is this the correct way to update the listener.max_connections limits?

make_install(git = "https://github.com/erlio/vmq_mzbench.git",
             branch = "master")
pre_hook():
     exec(all, "sudo chmod a+rwx /etc/vernemq/vernemq.conf")
     exec(all, "sudo sed -i '/listener.max_connections = 400000/c\\listener.max_connections = 450000' /etc/vernemq/vernemq.conf")
     exec(all, "sudo service vernemq restart")
parsifal-47 commented 4 years ago

oh, I see, I would recommend to ask vernemq devs about listener.max_connections here: https://github.com/vernemq/vmq_mzbench/issues

they should have done testing beyond 30k clients, also the limit you hit may be cause by Erlang max number of processes which is 30k by default: http://erlang.org/documentation/doc-5.8.4/doc/efficiency_guide/advanced.html

this part is easy to fix, if I remember correctly, you just need to specify env variable vm_args=+P 1000000, where 1000000 is the new limit

alinanasir01 commented 4 years ago

Hi Renat, changing vm_args didn't work. we are trying to cross 28k connections on a single machine, but the number of connections/subs always stops at 28228/9. is there any other way to go about it? also another issue is that we can only create 60 nodes in a single cluster, anymore than that , ans the director node fails.

parsifal-47 commented 4 years ago

Hi, could you extract error message that you get?

alinanasir01 commented 4 years ago

there is no error message at first, the connections just stop at 28228 and the test keeps on running after ~8mins "Pool crashed" or "Dynamic Deadlock detected" appears. Following are the system error logs

05:39:08.344 [error] [ API ] <0.472.0> Benchmark result: FAILED Dynamic deadlock detected

05:39:08.403 [error] [ API ] <0.390.0> Stage 'pipeline - running': failed Benchmark has failed on running with reason: {benchmark_failed,{asserts_failed,1}} Stacktrace: [{mzb_pipeline,error,2, [{file,"/home/ubuntu/mzbench/server/src/mzb_pipeline.erl"}, {line,90}]}, {mzb_pipeline,'-handle_cast/2-fun-0-',6, [{file,"/home/ubuntu/mzbench/server/src/mzb_pipeline.erl"}, {line,172}]}]

parsifal-47 commented 4 years ago

oh, this is something different from what I expected, this message means that there is a worker (thread) which waits for signal but no other alive worker can generate.

It should be something prior to that, I mean, it should be a reason why these workers have failed to start, no other error before?

alinanasir01 commented 4 years ago

these are the system eror logs for another test that stopped at 28228 connection,

2:19:50.053 [error] [mzb_director38_0@127.0.0.1] <0.280.0> Received DOWN from pool <15165.278.0> with reason noconnection

12:19:50.055 [error] [mzb_director38_0@127.0.0.1] <0.610.0> Could not connect to node at "3.104.53.85":4804 with reason: econnrefused 12:19:50.070 [error] [ API ] <0.468.0> Benchmark result: FAILED Unexpected error: {pool_crashed, {gen_server,call,[mzb_director,attach,infinity]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,223}]}, {mzb_bench_sup,get_results,0, [{file,"/tmp/bench_mzbench_api_ip-172-31-38-143_1582_543611_613416/deployment_code/node/apps/mzbench/src/mzb_bench_sup.erl"}, {line,52}]}, {mzb_management_tcp_protocol,'-handle_message/2-fun-0-',1, [{file,"/tmp/bench_mzbench_api_ip-172-31-38-143_1582_543611_613416/deployment_code/node/apps/mzbench/src/mzb_management_tcp_protocol.erl"}, {line,58}]}] 12:19:50.072 [error] [ API ] <0.387.0> Stage 'pipeline - running': failed Benchmark has failed on running with reason: {benchmark_failed, {unexpected_error, {pool_crashed,{gen_server,call,[mzb_director,attach,infinity]}}, [{gen_server,call,3,[{file,"gen_server.erl"},{line,223}]}, {mzb_bench_sup,get_results,0, [{file, "/tmp/bench_mzbench_api_ip-172-31-38-143_1582_543611_613416/deployment_code/node/apps/mzbench/src/mzb_bench_sup.erl"}, {line,52}]}, {mzb_management_tcp_protocol,'-handle_message/2-fun-0-',1, [{file, "/tmp/bench_mzbench_api_ip-172-31-38-143_1582_543611_613416/deployment_code/node/apps/mzbench/src/mzb_management_tcp_protocol.erl"}, {line,58}]}]}} Stacktrace: [{mzb_pipeline,error,2, [{file,"/home/ubuntu/mzbench/server/src/mzb_pipeline.erl"}, {line,90}]}, {mzb_pipeline,'-handle_cast/2-fun-0-',6, [{file,"/home/ubuntu/mzbench/server/src/mzb_pipeline.erl"}, {line,172}]}]

I'm sorry this is all pretty vague, I am very new to mzbench/vmq.

parsifal-47 commented 4 years ago

no problem, it means that the limit is on 3.104.53.85:4804, which is MZBench node internal connection port, the problem is at MZBench side, looks like vm_args gives no effect, I need to check how to make sure this parameter is propagated to the Erlang VM

alinanasir01 commented 4 years ago

Thank you!

alinanasir01 commented 4 years ago

hi, any updates on the issue ?

parsifal-47 commented 4 years ago

Hi, sorry for the delay, vm_args was a false lead since it is already set to 800k for MZBench Node here: https://github.com/mzbench/mzbench/blob/master/node/rel/files/vm.args

I would expect some other system-level limit. Very first thing I should have asked, how do you run MZBench, is it Docker, RPM of from sources? Is it Erlang 21+ version from here: https://github.com/mzbench/mzbench or Erlang 17 version from the repo we currently at?

Thanks

alinanasir01 commented 4 years ago

Hi, We run MZbench via RPM, we have tried to run this repo with Erlang 17 and the other repo (https://github.com/mzbench/mzbench or) with Erlang 21+ both give the same 28k limit.

parsifal-47 commented 4 years ago

Is it Centos7 or Amazon linux? I should have updated these RPMs long time ago

alinanasir01 commented 4 years ago

we have been using ubuntu

parsifal-47 commented 4 years ago

Hi, I just published updated RPM, please check if it resolves the issue

Thanks!

alinanasir01 commented 4 years ago

HI , the RPM wasn't working for us this time around so we cloned the git repo and used mzbench via that , but still hit the 28k limit

dhruvjain99 commented 3 years ago

@parsifal-47 @ioolkos am trying to create 1M connections using mzbench. Beyond 800k connections I am getting the following error.

23:20:13.632 [error] [mzb_director369_0@127.0.0.1] <0.301.0> Received DOWN from pool <15299.280.0> with reason noconnection
--
23:20:13.638 [error] [mzb_director369_0@127.0.0.1] <0.2783.0> Could not connect to node at "10.120.11.108":4804 with reason: econnrefused
16:20:13.645 [error] [ API ] <0.9928.4> Benchmark result: FAILED Unexpected error: ...
ioolkos commented 3 years ago

@djcruz93 please use mzbench/mzbench repo for new issues. I'm unlikely to be able to help, but certainly cannot be of use without further information. You hit an econnrefused, so it seems different component in MZBench cannot connect to each other.