vernemq / vmq_mzbench

An MQTT loadtest and usage scenario tool for VerneMQ and other MQTT systems.
Apache License 2.0
42 stars 44 forks source link

MQTT Pub to Sub Latency not shown when pool is large #10

Closed gdhgdhgdh closed 7 years ago

gdhgdhgdh commented 7 years ago

Hi (again)

I'm now getting sensible results, but have a query... e.g. these should be broadly similar:

mzbench run --nodes=4 --env poolsize=250 --env rate=40 mqtt-ramp.bdl

mzbench run --nodes=4 --env poolsize=500 --env rate=20 mqtt-ramp.bdl

mzbench run --nodes=4 --env poolsize=1000 --env rate=10 mqtt-ramp.bdl

All three scenarios execute successfully, and I see the expected throughput in the graphs. There is one small fly in the ointment. In the final run where poolsize = 1000, the MQTT Pub to Sub Latency graph is empty.

This is completely repeatable. If I drop back to 500 / 20, the graph is drawn again. The pub/sub latency is very interesting for us so I'd really like to know if it can appear more reliably.

Oddly I found that the 1000/10 run was the least stressful on VerneMQ of all of the three runs!

image

From left to right, the graphs are 250/40, 1000/10, 500/20. Similarly the amount of network traffic was much less on the 1000/10 run - is this MZBench 'scaling back' its reporting and causing less stress on VerneMQ (or simply a pleasant side-effect of the pub/sub latency graph not being drawn) ?

We are using two VerneMQ nodes in a cluster. (Yea, two is a bad number for quorum...)

ioolkos commented 7 years ago

What type of scenario is it? fan-in?

The most current reason for not seeing an end to end latency is simply that there are no consumers connected. You can see this in the Messages pane of MZBench, where you see the number of published and consumed messages.

The reason for this is that the underlying client for the MQTT workers is completely asynchronous. So sometimes you have to play around with wait times during the setup of consumers.

gdhgdhgdh commented 7 years ago

It's fan-in. Thank you for the tip about timing.. I've jiggled the wait() values around (currently wait 10 before the connect of the 2000 poolsize.. and another wait 10 before the loop... and that seems to be working :)

Thank you!