softwaremill / mqperf

https://softwaremill.com/mqperf/
Apache License 2.0
145 stars 37 forks source link

[Artemis] Migrate to 3 masters topology #47

Closed ghostbuster91 closed 4 years ago

michaelandrepearce commented 4 years ago

I would suggest validating if this actually improves perf for your setup.

franz1981 commented 3 years ago

@adamw @ghostbuster91 I believe this configuration isn't optimal to scale, given that right now producers and consumers risk to be paired on different live brokers, forcing message redistribution among brokers (and replications too): we're still working on a solution that would correctly suggest which broker to connect with ie affinity.

IMO with the number of clients in the test, using a single live broker is enough, while the others are there just to save from split brain scenario. I hope that help: that's probably why the results are so different from 3 years ago :)

I suggest to turn off message-load-balancing, setting an non-existing bogus address name as the one that's going to be re-distributed (to save notifications to move across cluster nodes) and it should scale way better then just using several nodes.

adamw commented 3 years ago

So a setup with a single live broker, single backup and one standby node to resolve splits would perform better than 3 live-backup pairs working in parallel?

franz1981 commented 3 years ago

with a single live broker, single backup and one standby node to resolve splits would perform better than 3 live-backup pairs working in parallel?

In general yes, in absence to a smart (or ad-hoc) work/clients partitioning among broker nodes: this is explained well with the http://www.perfdynamics.com/Manifesto/USLscalability.html that shows that the cross-talk penalty associated with coherence communications (ie the messages redistributions across nodes, acks, notifications etc etc) can lead to an exponential diminishing return in the scalability eg

image

That looks very similar to what we get on your bench results :)

Related this part of your comment

with a single live broker, single backup and one standby node

Right now, with the current quorum vote implementation we are forced to have 3 live pairs, because backups won't participate to the vote. This is something we're addressing for the next release, using a different quorum algorithm, hence the suggested topology would be to use 3 lives, no message redistribution among them, and a single backup to serve one specific live. Clients should just connect to a single live unless a fail on it is going to happen and will move to the backup.

I let @michaelandrepearce comment if he has other useful suggestions too :)

adamw commented 3 years ago

Thanks for the link and the explanation :)

Right now, with the current quorum vote implementation we are forced to have 3 live pairs, because backups won't participate to the vote.

Ok, so if we are testing current versions, the setup we have is "correct" if we want to have data replication & a split-brain-safe cluster?

franz1981 commented 3 years ago

Ok, so if we are testing current versions, the setup we have is "correct" if we want to have data replication & a split-brain-safe cluster?

Yes, although the backups on the other "witness" lives are not necessary (but still can exists) if you don't plan to distribute any work to them, they only participate to the quorum vote. And, is important to mention, that message (re)distribution among lives shouldn't happen, as I've mentioned in the previous comment (including clients connection to any live node of the cluster).

adamw commented 3 years ago

And, is important to mention, that message (re)distribution among lives shouldn't happen, as I've mentioned in the previous comment (including clients connection to any live node of the cluster).

Anything we should change in the current config to prevent that?

franz1981 commented 3 years ago

@adamw Yes, the address param on cluster-connection should be set with some impossible and not existing address, to prevent bindings and other notifications to be distributed between cluster nodes and using OFF as load balancing policy too should do the trick!

franz1981 commented 3 years ago

@adamw Let me know if the explanation was enough; I can try help validating the config too :+1:

franz1981 commented 3 years ago

@adamw Any news on https://github.com/softwaremill/mqperf/pull/47#issuecomment-779791673 There is something I can do to help eg sending a PR to fix this?