scylladb / scylla-operator

The Kubernetes Operator for ScyllaDB
https://operator.docs.scylladb.com/
Apache License 2.0
324 stars 159 forks source link

Limit scylla operator to namespace scoped #1663

Open sourabhsarkar opened 6 months ago

sourabhsarkar commented 6 months ago

What should the feature do?

Hi, I have a requirement where in a single kubernetes cluster multiple teams can have separate scylla operator deployments and their own management. I am trying to make the operator namespace scoped but not finding any such namespaces to watch switch in the crd. Any inputs related to the same would be helpful.

What is the use case behind this feature?

In multi-tenant clusters limiting scylla operator to particular namespaces to watch can benefit tenants manage their own scylla operator and cluster deployments.

Anything else we need to know?

No response

tnozicka commented 6 months ago

As I've already mentioned operator is a cluster level extension that needs it's CustomResourceDefinitions updated with it. You can't run multiple operators using the same CRD as the version would collide on the global resource. https://forum.scylladb.com/t/can-the-scylla-operator-be-installed-namespace-scoped/1137/2?u=tnozicka

In multi-tenant clusters limiting scylla operator to particular namespaces to watch can benefit tenants manage their own scylla operator and cluster deployments.

Can you explain why you can't install the operator as cluster admin (you need it to install and update the CRD itself anyways) and grant user permissions to create ScyllaCluster CRs?

steve-gray commented 6 months ago

@sourabhsarkar there have been multiple threads like this, including one where I tried at length to get @tnozicka to see any other point of view on this, but unfortunately they did not subscribe to any of the points raised, and ultimately declined to make the tiny changes involved to support this. I've tried to find it, but I assume it got purged. (Edit: Found it, it was in a PR not an issue)

Specifically:

I'm already sick of maintaining my own fork of this just to fix the namespace hard coding.

steve-gray commented 6 months ago

Some background reading on this topic, where many people have raised these points:

tnozicka commented 6 months ago

declined to make the tiny changes involved to support this.

As I recall my main point was that those changes are far from tiny but I also think that's an off-topic here as it's discussed in #1579 . Configurable namespace names are far from supporting multiple instances of the operator in one cluster or making non-namespaced versions of namespaced CRs.

its perfectly possible to have one thats configured to run at restricted scopes

That's a bit hard when you manage stuff like Node tuning / setup, obviously global scoped resources like that would conflict. Or the CSI drivers that need to run on every Node - I don't see how you can restrict that to a namespace.

and it's easily something that can just be the cluster administrators concern regardless of the operator versions in use

"regardless" - it is the oposite - the CRD version must be in sync with the operator version - there are new fields / new CRDs that the operator relies on. Those are coupled and tolerate at most N-1 skew temporarily for updates.

While I could see we could eventually make the namespace name configurable (it's a tradeoff between the cost of the change, maintenance and the added value), I don't yet see a good reason to run multiple operator instances in the same cluster, nor how that would work for the whole stack including Node tuning, CSI driver, or keeping the admin managed CRDs in sync.

It's technically possible (but complex) to make namespaced level informers with client-go and disable half of the operator functionality but there would have to be a quite good use case to accept that level of complexity and someone who needs it would have to invest their time (weeks/months) to make it happen.

steve-gray commented 6 months ago

@tnozicka,

As I recall my main point was that those changes are far from tiny

No, they're not. The biggest single change here is to undo the commit you applied in the first referenced example, and then a single digit number of changes elsewhere to permit this to be dynamic. You willfully welded this particular hatch shut on namespace naming, for no real reason.

That's a bit hard when you manage stuff like Node tuning / setup,

Customer problems. Not yours. Its up to me know to know that my nodes are dedicated to Scylla, and the personal preference for the operator to be global scope and single instance has precisely zero impact on the IO tuning process. I'm happy for you to correct me on this via email, at steve at zeroflucs.io - but I know manifestly this is not true. Holistically though, what does node tuning have to do with namespace scoping though? Its a total diversion on topic that has no relevance.

"regardless" - it is the oposite - the CRD version must be in sync with the operator version

This is nonsense - and presents as wilfully obtuse. During an upgrade of any kind, the CRD and the instances of the CR's in the cluster will be out of step during the process. If I step up from helm version 1 to version 2 in terraform, it'll install the new CR's and instantly I'm in the same boat.

For each release you must consider back/forward compatibility, and this obligation is neither lessnend or increased if a cluster operator wants to run multiple instances. Its contigent upon us to know the version compatibility ranges of the resources, and make sure all operators and CR's step up at the cluster level when upgrading.

This is another example of a "valid point" with no specific bearing on the issue.

While I could see we could eventually make the namespace name configurable (

Its essentially undoing a change you made before, and a tiny number of lines. I've been maintaining a fork for this for a year because of your refusal to adopt contemporary standards.

It's technically possible (but complex) to make namespaced level informers with client-go

Are you implying that an "if" statement to filter out on a list of namespaces will break the back of the project? Traefik does this very well in terms of scoping (in fact, it has a fallback helm variable for global scoping too from memory?)

but there would have to be a quite good use case

I've given explicit bullet points, and given you've ignored all of them - plus every other point anyone else has raised, we're clearly not considering general community standards for operators, cluster practices, or anything, so lets flip this - what do you personally consider sufficient case here?

tnozicka commented 6 months ago

@steve-gray I am afraid that to the tone that you set here any point I make is irrelevant. Please understand that this is a software you don't pay for and your goals and priorities may not necessarily align with ours. It's perfectly fine to fork and develop the software to your liking with your own time invested, after all this is open source. But with all the respect, please try not to bash on other peoples issues.

steve-gray commented 6 months ago

@steve-gray I am afraid that to the tone that you set here any point I make is irrelevant.

Valid points are valid points, regardless of the origin. You've yet to raise a solid defence of this hard-coded namespace stuff though that I've seen, but I'm happy to be corrected?

Please understand that this is a software you don't pay for and your goals and priorities may not necessarily align with ours

Everyone is generally pretty respectful of the work Scylla does in the Open Source space, but that is not a wave-off for any criticism of technical decisions that have been made, especially one like this where people have repeatedly articulated clear reasons the decision that was made previously is problematic, and make adoption of the software difficult or problematic. If folks were doing something similarly non-idiomatic in the YUM packaging of the main project, I'm sure they'd have taken that on board, or at least been open to an improvement.

But with all the respect, please try not to bash on other peoples issues.

Am I to take that your view is that because you've dismissed this previously, without answering any of the points, that anyone who's raised it before is precluded from chiming in when another user raises the exact same problem and adds weight to it? Thats somewhat disingenuous a way to run an OSS project.

sourabhsarkar commented 6 months ago

@tnozicka @steve-gray let's discuss what needs to be done and the blockers to make this namespace scoped.

Nodeconfig is optional. And csi can be handled by dynamic provisioning lvm operators.

So does the operator even need access to cluster level k8s resources like pv and nodes?

steve-gray commented 6 months ago

@tnozicka @steve-gray let's discuss what needs to be done and the blockers to make this namespace scoped.

Nodeconfig is optional. And csi can be handled by dynamic provisioning lvm operators.

So does the operator even need access to cluster level k8s resources like pv and nodes?

No, it does not need anything of the sort. What this operator does, at its core, is schedule a statefulset - the statefulset has PVC's, which your provisioner for volumes will allocate PV's to fulfil. The nodes are selected by the Kubernetes scheduler, and this will handle taints and tolerations passed from the custom resource. The node profile you select doesn't even need to exist at scheduling time if you're using something like Karpenter to provision them.

The operator can do deployment scheduling without any intrinsic access to, awareness of, or dependency on the volume provisioner, the node topology or anything else physical about the cluster.

The work, is roughly:

There is no node-config specific element here to really do. At this point, you can have a custom namespace.

The residual work to make this multi-instance is to tease out the scylla-manager specific bits into their own thing (its another example of "there can be only one" in places), but that probably belongs on the Scylla-Manager repo for discussion. A V2 of this can come back and make the filtering occur in a way that doesn't require a global informer, since that's something of a sticking point I gather (though I've been unable to understand why).

Edit: Just want to call out that there is a little more to the operator than "just" deploying a statefulset with the cluster state introspection and upgrade controls, but these are essentially probing into that resource its provisioned at runtime. Just in case anyone takes this as a write-off of the some of the smarter stuff in the operator.

scylla-operator-bot[bot] commented 3 days ago

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

/lifecycle stale