monkey-projects / monkeyci

Next-generation CI/CD tool that uses the full power of Clojure!
https://monkeyci.com
GNU General Public License v3.0
5 stars 0 forks source link

Messaging solutions #37

Closed wneirynck closed 6 months ago

wneirynck commented 1 year ago

Since MonkeyCI is being designed as a (potentially) distributed system, we'll be using messages extensively. We need to decide on which messaging system will be the most appropriate for our use. As I see it, there are three possible paths to follow:

  1. Self-managed (ActiveMQ, RabbitMQ,...)
  2. Public cloud platform (OCI streaming, GCP pub/sub,...)
  3. Third party (IronMQ,...)

Let's discuss these various options here.

Self-managed

Pros:

Cons:

Setting up our own solution gives us a lot of freedom to choose and to configure. However, this means also the added burden of having to maintain and monitor it ourselves. If a problem arises, we have to solve it on our own. Also, we need to host it ourselves, which means we need to run it as a container, with additional costs.

Public Cloud

Most public clouds offer soms sort of messaging solution. Pros:

Cons:

Depending on the cloud provider, the tools have featureset that matches our requirements, or not. For example, OCI does not seem to offer a good messaging solution. GCP pub/sub may be usable. Pricing can vary from (partially) free to use, to reasonably cheap. On intensive use this may become more costly.

Third party tools

Pros:

Cons:

Using third party tools, like IronMQ would most likely match our requirements the best. There may be providers out there that allow us some limited free use. The main advantage is that is their core business, so we may expect them to do a good job. The downside could be that they don't have good integration with cloud providers, but they will support all major clouds and protocols out of the box.

Conclusion

Further investigation on the available tools is required, but I would say that hosting our own tool is to be avoided. It would take a lot more effort to maintain it, which is something that we absolutely want to avoid. Public cloud offering is limited, especially if we don't want to mix cloud tools. So I think that looking into the third party solution is the first option we should pursue.

wneirynck commented 1 year ago

After investigating the existing SaaS providers I've noticed that the advantage is limited. pandio offers a free trial, but it doesn't work. iron.io has a working trial, but it does not seem to support events, only queues (in which case OCI queues are better for us). Solace has the functionality we need, but pricing information is absent (same for Iron.io btw). Well, regarding pricing, they all seem to want to keep it as secret as possible. Seems like this kind of service is not a big market, I assume most companies either work with dedicated clusters, or manage their own. I will now investigate public clouds (OCI and GCP).

wneirynck commented 1 year ago

Iron.io gave pricing info. We could use either the free, which limits us to about 2 requests/sec, or a paid plan which is very expensive (starts at $400/month). I'm assuming the competitors will have similar prices.

Looking into the offerings for GCP or OCI, I've seen that OCI has streaming, which does not support many clients. This is probably similar to GCP pub/sub, where each message counts toward a limit. In any case these are not intended for sending events to many clients at once. I was hoping to use a service like this to allow CLI's to view events, but this is probably a bad idea.

Instead I've split up the event use cases into these domains:

  1. Incoming webhooks that result in a new build.
  2. Events produced by various parts of the application, that could be consumed by one or more other parts.
  3. Events to be sent to ephemerate clients.

Webhooks

These could actually be handled by a queue processor, that in turn triggers a cloud function. This function could do some initial checks, like verify if the repository in question actually has a build script available, and check if the webhook is valid and liked to a customer. Then it would start a container instance to actually execute the pipeline.

Both OCI and GCP have a way to automatically invoke a function when an event like this is received.

Service to Service Events

Internally consumed events could use the cloud-provided event system (e.g. OCI streaming or GCP pub/sub). In these cases the consumes would have to be long-running processes (container instances) or functions. Functions are interesting for less-frequently sent events.

Client Events

Clients are possibly short-running and there can be many of them. It's also no disaster if an event is missed. They also require more fine-grained filtering and security. Events for one customer should not be sent to another one. So in this case it would be better to just let them call a HTTP endpoint and use SSE's. This is also how events are currently implemented. In this case the client event dispatcher is a long-running process with a HTTP interface that is consuming cloud events.

Conclusion

It would be more cost-efficient to just split up the architecture like described above and use cloud-provided event systems. Since we're currently using OCI, I will proceed with using OCI streaming and functions.

wneirynck commented 7 months ago

Turns out OCI streaming is also quite expensive. Currently using ZeroMQ. This is only a library, so it requires coding. But it's also very versatile and gives us complete freedom. It can also be used for other things, like log and blob streaming.

wneirynck commented 6 months ago

ZeroMQ also has its issues. It will probably be possible to make it work in a stable fashion, but since we currently don't need that kind of flexibility, and we want to make as much progress as possible, I have switched to ActiveMQ Artemis instead. This does mean another 3rd party tool to manage, but currently it works and does what we need. Could be that we replace it with a custom solution, or Cloud streaming later on.