Open vedavidhbudimuri opened 6 years ago
Hello, this could happen because the node is failed to start, if you use "static" host list you may check this by going to worker node and checking "beam.smp" process for presence.
If it has failed to stop older node, it could affect all benchmarks after. Sometimes it happens because of too high load.
Please let me know if killing "beam.smp" solves the problem
Yeah, I could find more than one beam.smp processes so I kill all of them. And restart the mzbench server but im still facing the same issue.
do you run workers and API server on the same host? (do you use one host for everything) second thing, "beam.smp" could be automatically restarted with "heart", you need to make sure that it dies completely or consider killing "heart" as well
@parsifal-47 sorry, I noticed this heart part later. I rebooted the system to kill the heart completely and continued my load testing. It worked for a while but after some time Im still getting this issue.
Yeah Both my API Server and host are on the same system
looks like it fails to stop the node for some reason, killing "beam.smp" and "heart" is not how it should work normally, I'll think on some other suggestions
as far as I understood the situation: after some number of benchmarks it fails to stop the node and you cannot start any more.
is overall system utilization high? I could suggest one experiment: try to run some number of benchmarks with less load, if it won't be reproduced then high load is the reason.
Please let me know the result in any case, Thanks!
Hey @parsifal-47, I have checked the CPU Utilization which is not 100% for sure and memory its not even consuming 30%.
Is there any other things we can check? and come to some conclusion.
Oh, I just realized that you developed your own plugin, I'll check the code, is it here: https://github.com/vedavidhbudimuri/emq_custom_plugin ?
@parsifal-47 Yeah tried some custom plugin but it is for the mqtt-broker and anyway it is in disabled mode. Im using https://github.com/erlio/vmq_mzbench/ this plugin for mzbench
Great! This one is well-known, shouldn't bring a problem per se. Let me know if the scenario is also available online, I'll try to repeat your issue.
I have another idea it could be the following:
Server could be crashing before stopping the node, have no idea why, but it is possible,
to check that please check your server logs at
/opt/mzbench_api/log/error.log
or <your_mzbench_dir>/server/log/error.log
it could be some crash info there
also, on the dashboard you should see disconnect (red) and connect (green) messages because websocket is closed and reopened in this case
I'm using mzbench to stress test an MQTT broker. It worked well in the beginning, but suddenly I'm getting the below error. I tried running the old benchmark which worked well earlier, even they started showing the same error.
SystemConfig 8 Core 32G system Ubuntu
Error: 04:46:52.530 [error] [Undefined] <0.218.0> gen_server mzb_time terminated with reason: {timeout,{gen_server,call,[mzb_interconnect,get_director]}} in gen_server:call/2 line 206 04:47:03.892 [error] [Undefined] emulator Error in process <0.20270.0> on node 'mzb_director94_0@127.0.0.1' with exit value: {{badmatch,{error,timeout}},[{cpu_sup,measurement_server_init,0,[{file,"cpu_sup.erl"},{line,498}]}]} 04:47:13.342 [error] [Undefined] <0.218.0> CRASH REPORT Process mzb_time with 0 neighbours exited with reason: {timeout,{gen_server,call,[mzb_interconnect,get_director]}} in gen_server:call/2 line 206 04:47:14.884 [error] [Undefined] <0.154.0> Supervisor mzb_sup had child time_service started with mzb_time:start_link() at <0.218.0> exit with reason {timeout,{gen_server,call,[mzb_interconnect,get_director]}} in context child_terminated 04:46:52.530 [error] [Undefined] <0.218.0> gen_server mzb_time terminated with reason: {timeout,{gen_server,call,[mzb_interconnect,get_director]}} in gen_server:call/2 line 206 04:47:03.892 [error] [Undefined] emulator Error in process <0.20270.0> on node 'mzb_director94_0@127.0.0.1' with exit value: {{badmatch,{error,timeout}},[{cpu_sup,measurement_server_init,0,[{file,"cpu_sup.erl"},{line,498}]}]} 04:47:13.342 [error] [Undefined] <0.218.0> CRASH REPORT Process mzb_time with 0 neighbours exited with reason: {timeout,{gen_server,call,[mzb_interconnect,get_director]}} in gen_server:call/2 line 206 04:47:14.884 [error] [Undefined] <0.154.0> Supervisor mzb_sup had child time_service started with mzb_time:start_link() at <0.218.0> exit with reason {timeout,{gen_server,call,[mzb_interconnect,get_director]}} in context child_terminated