rebus-org / Rebus

:bus: Simple and lean service bus implementation for .NET
https://mookid.dk/category/rebus
Other
2.3k stars 356 forks source link

What is the recommended approach for implementing multi-tenancy #889

Closed evaldas-raisutis closed 4 years ago

evaldas-raisutis commented 4 years ago

Hi,

Consider the following scenario.

We are importing product data from multiple stores into RabbitMQ. For simplicities sake, we have three message types: CreateProductCommand, UpdateProductCommand and RemoveProductCommand. Is message carries a shop ID which identifies the shop, and this shop ID can be put in the headers of the message when it is sent (bus.Send) to RabbitMQ.

We also have a consumer background service. The background service handlers incoming messages with a configuration of 1 Worker and Parallelism of 1. This is because we cannot process multiple messages concurrently if those messages belong to the same shop id. As such, I'm looking for a way to setup our background service to process messages belonging to different shops concurrently:

How can I configure Rebus, to dynamically set worker count and configure each worker to only process messages with specific header values?

I found this post on stackoverflow, indicating this is not possible: https://stackoverflow.com/questions/26214909/configure-rebus-so-that-if-the-message-contains-an-worker-id-only-that-worker-w , but it's quite old.

If this indeed is not possible, then what would be the recommendation? I could dynamically create multiple Bus instances at startup and subscribe each to a specific queue (possibly named after the shop id) and then make sure I'm routing messages to correct queues. However, as I understand it, it's not recommended to have multiple Bus instances running.

Another option could be to run multiple background service instances, each with it's own input queue. However, we can expect shop count to change dynamically, so this would not be ideal due to having to manage background service instances in relation to shop count.

Finally, I could also decorate each handler with something along the lines of PreventConcurrentShopHandlerDecorator, which could track if a given shop id is already being processed and re-queue any subsequent messages with the same shop id.

The reason we need to prevent multiple messages of the same shop from processing concurrently, is because message handlers will sometimes create a new database entry which has it's own identity. If multiple handlers try to create the same entry - there will be a deadlock or a concurrency error at the database level.

Another example, is when we are pushing different kinds of messages from message handlers to an external API. This API enforces a request-rate-limit per shop, which means sending data to the API for a specific shop can be quite slow. So it would be more efficient to do this for multiple shops in parallel.

Any and all help/feedback is appreciated :)

evaldas-raisutis commented 4 years ago

Digging through some of the code base and documentation on rabbitmq, I now realize this might better fit the Rebus.Rabbitmq repository. I'll leave it for here for now, because I'm curious how such "multi-tenancy" setup might look like with Rebus regardless of transport.

Additionally, what I was referring to by "Workers" - is essentially number of worker threads setup by Rebus. I suppose dedicating worker threads to specific tenants (shop ids) would not be ideal either.

Rebus aside, it seems the recommended approach to support multi tenancy in RabbitMQ (and maybe any message broker for that matter) would be to have a dedicated virtual host on the message broker (or entirely separate message broker instance). That's easy enough, since virtual hosts or brokers could be managed through an API. However, then the dilemma would be sending or publishing messages to the appropriate virtual host (in case of rabbitmq, or other type of transport connection) - since those are tied to connections (which are tied to transport implementation in Rebus).

I suppose what I'm really looking for, is to be able to choose a specific connection endpoint from a collection of connection endpoints when sending/publishing a message.

Looking at https://github.com/rebus-org/Rebus.RabbitMq/blob/master/Rebus.RabbitMq/Internals/ConnectionManager.cs#L63 it seems that it just picks the first available endpoint.

Sorry for the open monologue, I'll see if I can find how to extend the ConnectionManager in a way that allows my to add an endpoint per tenant (each with a unique virtual host) and then make sure the messages are sent to the correct endpoint using a header value.

It seems RabbitMQ.Client has a low level dependency that can be used to pick a specific host name (now if I could just inject a message context into my own implementation of IEndpointResolver... https://github.com/rabbitmq/rabbitmq-dotnet-client/issues/195

mookid8000 commented 4 years ago

Hi @evaldas-raisutis , thanks for your question and your reflections. I'll give you a really short answer:

Another option could be to run multiple background service instances, each with it's own input queue.

I think that's what I would do. 🙂

From what you've described so far, I think I would go for a solution where a router forwards messages to a numbered worker by doing a hash (e.g. Knuth modulo N, where N is the number of worker instances you have running).

A "worker" would be a Rebus instance with its own input queue. Each worker would then process messages in a serial fashion, ensuring that there's no concurrent processing of a tenant's messages.

It's totally OK to host multiple Rebus instances in the same process, as long as they each have their own IoC container – in fact, it's an approach that I often recommend, because it's easy to deploy, and it still encourages you to keep things separate, which makes it easier to deploy them as physically separate things later on.

I'll close this issue for now, but please keep the thread going if you have comments or additional questions. 🙂