planetscale / vitess-operator

Kubernetes Operator for Vitess
Apache License 2.0
304 stars 75 forks source link

Ready for production in 2023? #394

Open fzyzcjy opened 1 year ago

fzyzcjy commented 1 year ago

Hi thanks for the operator! I wonder whether it is ready for production or suggested to be used in production in 2023?

frouioui commented 1 year ago

Hey @fzyzcjy, thank you for your interest in the vitess-operator. The vitess-operator is already used by several companies in production since a while. The Vitess maintainers are maintaining this operator along with Vitess and are making sure it is compatible as we release new versions of Vitess.

fzyzcjy commented 1 year ago

@frouioui Hi thanks for your reply! Given that this operator is under planetscale, I wonder it is stable to be used by non-planetscale users?

frouioui commented 1 year ago

@fzyzcjy the vitess-operator is owned by planetscale and is the only OSS k8s operator for Vitess. It is totally stable to be used by non-planetscale users.

lisachenko-indriver commented 1 year ago

Whereas I'm really happy that this operator exists, it still requires a lot of work to be really production-ready. And I want to highlight this, as a lot of engineers really have serious problems understanding how Vitess works and how to monitor it.

For example, operator doesn't create monitoring.coreos.com/v1/ServiceMonitor or PodMonitor for Prometheus by default, thus you should do it manually for your cluster, otherwise all metrics from the vttablet (/metrics endpoint) and MySQL itself (via mysql_exporter side container) won't be exposed. And for production you will need this 100%, so expect a separate task for your devops team to do manual configuration and this should be ideally created by operator via setting just one field in Vitess cluster definition - "Hey, I would like to have Service and Pod Prometheus Monitors enabled for my Vitess cluster"

Also, I think that operator MAY benefit as well from automatic creation of Grafana dashbords via GrafanaDashboard CRDs and grafana-operator, based on all scrapped Prometheus metrics. Without this dashboard all DBAs/DevOps will be blind. Existing community vitess-mixin doesn't really contain all exported metrics from DBA perspective, this is why DBA/DevOps team should first dig into all existing metrics, learn them in a hard way and then define acceptable thresholds, which will be monitored then by monitoring team. Again, Vitess is complex software and it requires detailed monitoring, assuming that it will be used for production with real databases. Thus, all replication streams, running workflows, all lags should be visible on this dashboard.

Shard maintenance is not implemented now via operator (or I'm not aware about it). And this means that if you want to put one node of a shard for update - you will likely need to do it on low-level instead of traditional service/pod annotations. I can see some of annotations are implemented internally inside operator, but they are not documented and I'm not sure how to properly use them.

Good part: there are a lot of clever ideas, which might be contributed/implemented )

fzyzcjy commented 1 year ago

@frouioui Thanks for the reply!

fzyzcjy commented 1 year ago

@lisachenko-indriver Thank you for the suggestions! So I am curious, are you using Vitess in real databases, and do you find it stable / easy to maintain?

lisachenko-indriver commented 1 year ago

@fzyzcjy We are currently evaluating Vitess, thus I can't add more details, as this is just very initial evaluation of Vitess and planetscale operator.

But lot of experts (including ones from the Percona and AWS) report that it can be very tricky to run Vitess in production properly. That's why you should invest a lot of time first to understand how to use it, what limitations are applied to SQL queries, how sharding works, etc...

fzyzcjy commented 1 year ago

@lisachenko-indriver Thank you for the information! Btw curious what else are you evaluating? I have been considering things like:

Areso commented 1 year ago

Totally agree with Alexander, I would like to notice, that we had problem to run Vitess operator and thus we were forced to operate with the Operator from vitessio ( https://github.com/vitessio/vitess ). And they aren't the same. @fzyzcjy MondoDB isn't compatible with MySQL thus skipped, TiDB was in comparison list, ShardingSphere wasn't. The community here pretty welcoming, but still, if you can't solve your problems (with a help of the community), nobody will solve them for you. For example, Percona doesn't support Vitess nor any other major Value-Added Integrator does, as I concern.

fzyzcjy commented 1 year ago

@Areso Thanks for the information! So curious which one do you finally choose?

Areso commented 1 year ago

Still working on that topic. So far our favorite is Vitess, but it's not a silver bullet. We're still working trying to run it with our limitations.