supabase-community / supabase-kubernetes

Helm 3 charts to deploy a Supabase on Kubernetes
Apache License 2.0
367 stars 105 forks source link

Supabase-Kubernetes Roadmap: Feature Prioritization and Enhancements #53

Open arpagon opened 3 months ago

arpagon commented 3 months ago

This project is in its early stages. Let's use this issue to brainstorm and outline the essential components of our Supabase-Kubernetes Helm chart. Here's what we can discuss:

Additionally, we'd love your input on:

arpagon commented 3 months ago

@AntonOfTheWoods, @koryonik, @heresandyboy, and @drpsyko101 - We'd love your participation in shaping the roadmap for the Supabase-Kubernetes Helm chart! Your knowledge and experience would be invaluable as we outline its future.

@drpsyko101, a special thank you for your work on pull request #48. Your work on the Helm chart's is a solid foundation to continue building upon. This is an essential step in making the Supabase-Kubernetes Helm chart a valuable tool for the community.

drpsyko101 commented 3 months ago

@arpagon Thanks for the PR #48 merge. It is a long-needed update for this chart. As for what we can do for the future roadmap of this chart, I'd think we can tackle these:

As for QoL improvements, we need to decide whether we should use better/reusable charts or improve upon existing YAML topology as pointed out by @AntonOfTheWoods. This could very well determine the path that this chart will move forward. As for my opinion, we should stick with the current implementation and improve it over time, as we aren't deploying each of the Supabase services separately, like what other chart maintainers are doing. We can steal take inspiration from their code and implement a similar system into ours.

AntonOfTheWoods commented 3 months ago

@drpsyko101 I think that is a great checklist, however I would challenge the multi-node placement. That should be prioritised first and should be done before any other work is really even planned because otherwise the chart is only able to deploy toy deployments.

Kubernetes introduces a massive amount of complexity over single-node systems like plain compose but it brings phenomenal benefits - mainly because you can scale from a single node on a dev or CI/CD box to 1000-node clusters with relative ease. That is precisely what helm charts help you do in an elegant and maintainable way. That also brings in significant complexity over compose though...

I think it is very important to make it clear whether this chart is meant to be for (hobbyist/dev) single-node (minikube, microk8s) clusters or whether it will be a chart usable for large-scale, high-performance clusters. The current postgresql defaults to a deployment that doesn't have any sort of persistence. That means if you stop-restart the cluster the data disappears. This is exactly the sort of thing that defaults to something sane with charts like bitnami (or the operators you mention). Writing charts that scale and do what reasonably sized deployments need them to takes a lot of time, effort, and expertise and reinventing the wheel needs to be done only if you know you have the time and expertise to do better! :-)

drpsyko101 commented 3 months ago

@AntonOfTheWoods I understand your concerns about putting the multi-node support in the short term goals. But scaling up services other than vector also mandates high-availability support. This is especially difficult to set postgres and storage/minio to StatefulSet due to their complex replication setup, hence placing them in the long term goals.

However, it is a different story if the user uses external HA postgres and minio. Then it is viable to set vector to use DaemonSet setup as you've mentioned above. From what I can see, there shouldn't be many changes to other resources to do so.

bartekus commented 3 months ago

As a consumer of this setup (specifically the revision that @drpsyko101 recently contributed) I would reckon that the setup that Supabase provide (docker-compose based) and which has been replicated for cloud providers like Digital Ocean fill the hobby/small segment requirements quite well. As to Kubernetes setup, I would venture to assume that nobody is going to use anything in production that has Postgres as fundamental requirements and does not provide HA variant as the general target. Even out DevOps team was ok with single-node for initial trial, but flagged lack of HA Postgres as essentially compliance killer that is going to prevent the setup from ever being production grade approved and used beyond the scope of POC/MVP development. Helm charts is what our dev team uses so no complains there. Just my 2 cents, and thank you for all your continuous contributions and amazing work! You all rock!

24601 commented 3 months ago

You are definitely right. We use Supabase in production, self-hosted, on k8s. I think (and I am guessing) that the intent of the chart writers was not that people would use that PG instance provided, but that users (like us) would drop in our appropriate PG solution as necessary.

Now, that was not very easy, and I think you have a point that could be made much, much easier. It was a lot of chart surgery for us to drop in StackGres in lieu.

We are going to refactor our changes with some lessons learned and will PR them back at some point.

As a consumer of this setup (specifically the revision that @drpsyko101 recently contributed) I would reckon that the setup that Supabase provide (docker-compose based) and which has been replicated for cloud providers like Digital Ocean fill the hobby/small segment requirements quite well. As to Kubernetes setup, I would venture to assume that nobody is going to use anything in production that has Postgres as fundamental requirements and does not provide HA variant as the general target. Even out DevOps team was ok with single-node for initial trial, but flagged lack of HA Postgres as essentially compliance killer that is going to prevent the setup from ever being production grade approved and used beyond the scope of POC/MVP development. Helm charts is what our dev team uses so no complains there. Just my 2 cents, and thank you for all your continuous contributions and amazing work! You all rock!

AntonOfTheWoods commented 3 months ago

@AntonOfTheWoods I understand your concerns about putting the multi-node support in the short term goals. But scaling up services other than vector also mandates high-availability support. This is especially difficult to set postgres and storage/minio to StatefulSet due to their complex replication setup, hence placing them in the long term goals.

However, it is a different story if the user uses external HA postgres and minio. Then it is viable to set vector to use DaemonSet setup as you've mentioned above. From what I can see, there shouldn't be many changes to other resources to do so.

I think you are over-complicating things. HA doesn't have a specific definition that means anything useful. Something that calls a support engineer who then clicks a button that reloads from a backup on a new server is "HA" in many organisations. "Highly" can mean any number of 9s and you can take any number of factors into account - or not. How many separate network providers do you need before you consider your setup "HA"?

Now, that was not very easy, and I think you have a point that could be made much, much easier. It was a lot of chart surgery for us to drop in StackGres in lieu.

This is why I think it's just silly not to use bitnami where it makes sense. They have an excellent and very widely used system for making this very easy, including with proper secrets management. Their system has templates and formalisms for doing all this that could simply be copy/pasted. While they do have an "HA" version of a postgres chart, I didn't have much success with it but once you use their abstractions, it is very clear to everyone (who has very likely seen/used a bitnami chart before) how to swap out without having to spend a few hours making sure you are changing the right things in the right places.

drpsyko101 commented 3 months ago

If you're familiar with bitnami images, I'd reckon you've looked at their postgresql-ha. PostgreSQL, like many other databases, needs some sort of replication system at scale. Either by using master-standby, master-read, sharding nodes, etc. to ensure that the data is consistent across nodes in the event of node/container failure. In the case of bitnami, it uses pgpool to allocate master-read containers.

While they do have an "HA" version of a Postgres chart, I didn't have much success with it...

This is exactly why it takes time to implement multi-node support.

As for the overall chart syntax, it can be slowly improved over time. Making several breaking changes without any care for the existing users is bad for the community, no?

AntonOfTheWoods commented 2 months ago

As for the overall chart syntax, it can be slowly improved over time. Making several breaking changes without any care for the existing users is bad for the community, no?

I personally think that delivering a db module that is based on a deployment and has autoscaling clearly available in the template and values - even though what would happen if autoscaling did kick in is anyone's guess - is very much "without any care for the existing users".

In any case, I have been working on extending the bitnami supabase chart (recently updated with recent images for the supported modules) with the missing modules and it looks like it is going to be successful. That's the beauty of open source!

This is exactly why it takes time to implement multi-node support.

And exactly why I am going to put my trust in bitnami. Honestly, there are so many horrible bugs in this chart it is a danger to the community. Unless you there is a massive "THIS IS NOT MULTI-NODE COMPATIBLE, AND YOU CAN'T USE THE PROVIDED postgresql OR minio" at the top of the readme it's going to waste many people's time. Anyone who wants a single-node option would be a fool not to simply use the upstream-provided compose!

LinuxSuRen commented 2 months ago

Please consider having an OCI helm chart. And create an early version of the early adoption users.

Thank @drpsyko101 for letting me know this thread. See also https://github.com/supabase-community/supabase-kubernetes/issues/56#issuecomment-2065937761

drpsyko101 commented 2 months ago

@AntonOfTheWoods I think we aren't on the same page here. Both of us want the chart to be multi-node compliant. But as I've said above, I'm okay starting with external db & S3 for multi-node deployment. As the chart progresses, we can then implement our own HA solutions to fit the needs of some users who require self-contained charts.

Honestly, there are so many horrible bugs in this chart it is a danger to the community.

Not gonna deny this, it is no different from vanilla helm create. I sincerely hope more PR will be added to address that without many breaking changes.

We're still in version 0.1. There is a lot of room for improvement to make this chart better. Your contribution is very much appreciated!

bartekus commented 2 months ago

The good news is that with Supabase going GA bitnami updated their supabase helm charts (this was done just 4 days ago).

AntonOfTheWoods commented 2 months ago

If anyone is interested in a bitnami-oriented take on all this, have a look at https://github.com/supafull/helm-charts/ . It still has a lot of rough edges (and not everything has been tested, like the imgproxy, etc.) but it has mostly the same level of feature support as this chart now does, but it uses bitnami charts where possible, and the remaining stuff (analytics, functions and imgproxy) was heavily inspired by the bitnami way. That means it:

There isn't a lot of documentation yet but I'm going to be putting it through it's paces for a demo/example project using react-admin + electric-sql + supabase (before trying to migrate another real project to this), and (after my upcoming 5-day May-day break) will be making sure it all works and giving it a damned good documenting (though I might not bother with testing AWS S3 or Google BigQuery unless I get some stars!).

Please let the feedback flow freely!