msimonin / ombt-orchestrator

Framework to evaluate different message buses using oslo.messaging (via ombt)
GNU General Public License v3.0
3 stars 4 forks source link

How to scale the control-bus ? (was allow qdr as control bus) #55

Closed msimonin closed 6 years ago

msimonin commented 6 years ago

@jrbalderrama I had a second thought. We want to be able to scale an experiment (in terms of clients and servers), the control-bus should be as transparent as possible. Methodology-wise the control-bus uses broadcast to the agents to start the benchmark or retrieve the benchmark results. What sounds weird to me is to evaluate oslo_messaging patterns on a bus using the same patterns on the control-bus. We want the control bus to be as scalable as possible regardless the pattern under study.

I'd like to investigate a solution in which we shard the control-bus. We could add a new control-bus instance every 5k agents (tbd) and a new controller.

Results would need to be aggregated across controllers (and potentially topics). That seems possible.

Implementation-wise a first step would be:

Things left :

msimonin commented 6 years ago

I have a POC with a multicontroller : https://github.com/msimonin/ombt-orchestrator/tree/poc_multi_controller

I tested 2 controllers, 25 clients, 25 servers and got this (test_case_1):

backup

msimonin@fnancy:~/ombt-orchestrator/current/backup$ ls *controller*
grisou-10.nancy.grid5000.fr_controller-0-topic_1-nbr_servers__25-nbr_clients__25-nbr_topics__1-call_type__rpc-call-nbr_calls__100-pause__0_docker.log
grisou-10.nancy.grid5000.fr_controller-0-topic_1-nbr_servers__25-nbr_clients__25-nbr_topics__1-call_type__rpc-call-nbr_calls__100-pause__0.log
grisou-11.nancy.grid5000.fr_controller-1-topic_1-nbr_servers__25-nbr_clients__25-nbr_topics__1-call_type__rpc-call-nbr_calls__100-pause__0_docker.log
grisou-11.nancy.grid5000.fr_controller-1-topic_1-nbr_servers__25-nbr_clients__25-nbr_topics__1-call_type__rpc-call-nbr_calls__100-pause__0.log

Controller 1

msimonin@fnancy:~/ombt-orchestrator/current/backup$ cat grisou-10.nancy.grid5000.fr_controller-0-topic_1-nbr_servers__25-nbr_clients__25-nbr_topics__1-call_type__rpc-call-nbr_calls__100-pause__0_docker.log
RPC call test results
13 RPC clients, 13 RPC Servers (26 total)

Aggregated RPC Client results:
------------------------------
Total Messages: 1300
[...]

Aggregated RPC Server results:
------------------------------
Total Messages: 1298
[...]

controller 2

msimonin@fnancy:~/ombt-orchestrator/current/backup$ cat grisou-11.nancy.grid5000.fr_controller-1-topic_1-nbr_servers__25-nbr_clients__25-nbr_topics__1-call_type__rpc-call-nbr_calls__100-pause__0_docker.log
RPC call test results
12 RPC clients, 12 RPC Servers (24 total)

Aggregated RPC Client results:
------------------------------
Total Messages: 1200
[...]

Aggregated RPC Server results:
------------------------------
Total Messages: 1202
[...]
msimonin commented 6 years ago

Some ideas. The main thing is that all the agents (client/server/controller) are attached to the same bus but not necesseraly the same control-bus.

msimonin commented 6 years ago

Follow up. We can add a --shards to each test_case_* tasks and change the test_case function so that it takes a list of "sharded" agents. Agents in two different shard share the test-bus but not the control-bus.

Example:

# Single large distributed target

# we need to replicate the controllers
test_case_1(nbr_clients=10000, nbr_servers=100, shards=5) -> test_case([
(2000, 20, 1),
(2000, 20, 1),
(2000, 20, 1),
(2000, 20, 1),
(2000, 20, 1)
])
# Multiple distributed targets

test_case_2(nbr_topics=10000, shards=5) -> test_case([
(2000, 2000, 2000),
(2000, 2000, 2000),
(2000, 2000, 2000),
(2000, 2000, 2000),
(2000, 2000, 2000)
])
One single large distributed fanout
test_case_3(nbr_servers=10000, shards=5) -> test_case([
(1, 2000, 1),
(1, 2000, 1),
(1, 2000, 1),
(1, 2000, 1),
(1, 2000, 1)
])
One single large distributed fanout
test_case_4(nbr_clients=1, nbr_servers= 10, nbr_topics=1000, shards=5) -> test_case([
(200, 1000, 200),
(200, 1000, 200),
(200, 1000, 200),
(200, 1000, 200),
(200, 1000, 200)
])

Instead of specifying a tuple with the number of agents of each type, we could precalculate for each agent the topics of interest. e.g (200, 1000, 200)->[range(200), range(200)*5, range(200)]

msimonin commented 6 years ago

Sharding is implemented in https://github.com/msimonin/ombt-orchestrator/pull/93

msimonin commented 6 years ago

merged