Closed wneirynck closed 6 months ago
After investigating the existing SaaS providers I've noticed that the advantage is limited. pandio offers a free trial, but it doesn't work. iron.io has a working trial, but it does not seem to support events, only queues (in which case OCI queues are better for us). Solace has the functionality we need, but pricing information is absent (same for Iron.io btw). Well, regarding pricing, they all seem to want to keep it as secret as possible. Seems like this kind of service is not a big market, I assume most companies either work with dedicated clusters, or manage their own. I will now investigate public clouds (OCI and GCP).
Iron.io gave pricing info. We could use either the free, which limits us to about 2 requests/sec, or a paid plan which is very expensive (starts at $400/month). I'm assuming the competitors will have similar prices.
Looking into the offerings for GCP or OCI, I've seen that OCI has streaming, which does not support many clients. This is probably similar to GCP pub/sub, where each message counts toward a limit. In any case these are not intended for sending events to many clients at once. I was hoping to use a service like this to allow CLI's to view events, but this is probably a bad idea.
Instead I've split up the event use cases into these domains:
These could actually be handled by a queue processor, that in turn triggers a cloud function. This function could do some initial checks, like verify if the repository in question actually has a build script available, and check if the webhook is valid and liked to a customer. Then it would start a container instance to actually execute the pipeline.
Both OCI and GCP have a way to automatically invoke a function when an event like this is received.
Internally consumed events could use the cloud-provided event system (e.g. OCI streaming or GCP pub/sub). In these cases the consumes would have to be long-running processes (container instances) or functions. Functions are interesting for less-frequently sent events.
Clients are possibly short-running and there can be many of them. It's also no disaster if an event is missed. They also require more fine-grained filtering and security. Events for one customer should not be sent to another one. So in this case it would be better to just let them call a HTTP endpoint and use SSE's. This is also how events are currently implemented. In this case the client event dispatcher is a long-running process with a HTTP interface that is consuming cloud events.
It would be more cost-efficient to just split up the architecture like described above and use cloud-provided event systems. Since we're currently using OCI, I will proceed with using OCI streaming and functions.
Turns out OCI streaming is also quite expensive. Currently using ZeroMQ. This is only a library, so it requires coding. But it's also very versatile and gives us complete freedom. It can also be used for other things, like log and blob streaming.
ZeroMQ also has its issues. It will probably be possible to make it work in a stable fashion, but since we currently don't need that kind of flexibility, and we want to make as much progress as possible, I have switched to ActiveMQ Artemis instead. This does mean another 3rd party tool to manage, but currently it works and does what we need. Could be that we replace it with a custom solution, or Cloud streaming later on.
Since MonkeyCI is being designed as a (potentially) distributed system, we'll be using messages extensively. We need to decide on which messaging system will be the most appropriate for our use. As I see it, there are three possible paths to follow:
Let's discuss these various options here.
Self-managed
Pros:
Cons:
Setting up our own solution gives us a lot of freedom to choose and to configure. However, this means also the added burden of having to maintain and monitor it ourselves. If a problem arises, we have to solve it on our own. Also, we need to host it ourselves, which means we need to run it as a container, with additional costs.
Public Cloud
Most public clouds offer soms sort of messaging solution. Pros:
Cons:
Depending on the cloud provider, the tools have featureset that matches our requirements, or not. For example, OCI does not seem to offer a good messaging solution. GCP pub/sub may be usable. Pricing can vary from (partially) free to use, to reasonably cheap. On intensive use this may become more costly.
Third party tools
Pros:
Cons:
Using third party tools, like IronMQ would most likely match our requirements the best. There may be providers out there that allow us some limited free use. The main advantage is that is their core business, so we may expect them to do a good job. The downside could be that they don't have good integration with cloud providers, but they will support all major clouds and protocols out of the box.
Conclusion
Further investigation on the available tools is required, but I would say that hosting our own tool is to be avoided. It would take a lot more effort to maintain it, which is something that we absolutely want to avoid. Public cloud offering is limited, especially if we don't want to mix cloud tools. So I think that looking into the third party solution is the first option we should pursue.