Limit scylla operator to namespace scoped

sourabhsarkar commented 6 months ago

What should the feature do?

Hi, I have a requirement where in a single kubernetes cluster multiple teams can have separate scylla operator deployments and their own management. I am trying to make the operator namespace scoped but not finding any such namespaces to watch switch in the crd. Any inputs related to the same would be helpful.

What is the use case behind this feature?

In multi-tenant clusters limiting scylla operator to particular namespaces to watch can benefit tenants manage their own scylla operator and cluster deployments.

Anything else we need to know?

No response

tnozicka commented 6 months ago

As I've already mentioned operator is a cluster level extension that needs it's CustomResourceDefinitions updated with it. You can't run multiple operators using the same CRD as the version would collide on the global resource. https://forum.scylladb.com/t/can-the-scylla-operator-be-installed-namespace-scoped/1137/2?u=tnozicka

In multi-tenant clusters limiting scylla operator to particular namespaces to watch can benefit tenants manage their own scylla operator and cluster deployments.

Can you explain why you can't install the operator as cluster admin (you need it to install and update the CRD itself anyways) and grant user permissions to create ScyllaCluster CRs?

steve-gray commented 6 months ago

@sourabhsarkar there have been multiple threads like this, including one where I tried at length to get @tnozicka to see any other point of view on this, but unfortunately they did not subscribe to any of the points raised, and ultimately declined to make the tiny changes involved to support this. I've tried to find it, but I assume it got purged. (Edit: Found it, it was in a PR not an issue)

Specifically:

Having the operator be cluster-wide is not a best practice, and its perfectly possible to have one thats configured to run at restricted scopes. Many other operators permit this for this exact reason (in fact, I'm hard pressed to find one that doesn't)
The forced issue here prevents people having multiple instances, with different configurations/scopes/permissions, and takes the choice away from cluster operators to segregate access, even where they do control the whole cluster.
The argument about "CRD versions" is a straw-man, and it's easily something that can just be the cluster administrators concern regardless of the operator versions in use (i.e. if i want to update the CRDs, I need to update all my operators, but thats really no big deal).
Best practice is actually to divorce the helm installation for CRDs from the main software installation, and be diligent about compatibility considerations. This issue is actually applicable even in a single install, as updating CRDs in a way thats not backward-compatible can impact even a single global install.
The software has various hard-coded forced assumptions about namespace names, precluding any scenario where the person deploying the software is doing so in a cluster where they have to follow externally set naming conventions.

I'm already sick of maintaining my own fork of this just to fix the namespace hard coding.

steve-gray commented 6 months ago

Some background reading on this topic, where many people have raised these points:

tnozicka commented 6 months ago

declined to make the tiny changes involved to support this.

As I recall my main point was that those changes are far from tiny but I also think that's an off-topic here as it's discussed in #1579 . Configurable namespace names are far from supporting multiple instances of the operator in one cluster or making non-namespaced versions of namespaced CRs.

its perfectly possible to have one thats configured to run at restricted scopes

That's a bit hard when you manage stuff like Node tuning / setup, obviously global scoped resources like that would conflict. Or the CSI drivers that need to run on every Node - I don't see how you can restrict that to a namespace.

and it's easily something that can just be the cluster administrators concern regardless of the operator versions in use

"regardless" - it is the oposite - the CRD version must be in sync with the operator version - there are new fields / new CRDs that the operator relies on. Those are coupled and tolerate at most N-1 skew temporarily for updates.

While I could see we could eventually make the namespace name configurable (it's a tradeoff between the cost of the change, maintenance and the added value), I don't yet see a good reason to run multiple operator instances in the same cluster, nor how that would work for the whole stack including Node tuning, CSI driver, or keeping the admin managed CRDs in sync.

It's technically possible (but complex) to make namespaced level informers with client-go and disable half of the operator functionality but there would have to be a quite good use case to accept that level of complexity and someone who needs it would have to invest their time (weeks/months) to make it happen.

steve-gray commented 6 months ago

@tnozicka,

As I recall my main point was that those changes are far from tiny

No, they're not. The biggest single change here is to undo the commit you applied in the first referenced example, and then a single digit number of changes elsewhere to permit this to be dynamic. You willfully welded this particular hatch shut on namespace naming, for no real reason.

That's a bit hard when you manage stuff like Node tuning / setup,

Customer problems. Not yours. Its up to me know to know that my nodes are dedicated to Scylla, and the personal preference for the operator to be global scope and single instance has precisely zero impact on the IO tuning process. I'm happy for you to correct me on this via email, at steve at zeroflucs.io - but I know manifestly this is not true. Holistically though, what does node tuning have to do with namespace scoping though? Its a total diversion on topic that has no relevance.

"regardless" - it is the oposite - the CRD version must be in sync with the operator version

This is nonsense - and presents as wilfully obtuse. During an upgrade of any kind, the CRD and the instances of the CR's in the cluster will be out of step during the process. If I step up from helm version 1 to version 2 in terraform, it'll install the new CR's and instantly I'm in the same boat.

For each release you must consider back/forward compatibility, and this obligation is neither lessnend or increased if a cluster operator wants to run multiple instances. Its contigent upon us to know the version compatibility ranges of the resources, and make sure all operators and CR's step up at the cluster level when upgrading.

This is another example of a "valid point" with no specific bearing on the issue.

While I could see we could eventually make the namespace name configurable (

Its essentially undoing a change you made before, and a tiny number of lines. I've been maintaining a fork for this for a year because of your refusal to adopt contemporary standards.

It's technically possible (but complex) to make namespaced level informers with client-go

Are you implying that an "if" statement to filter out on a list of namespaces will break the back of the project? Traefik does this very well in terms of scoping (in fact, it has a fallback helm variable for global scoping too from memory?)

but there would have to be a quite good use case

I've given explicit bullet points, and given you've ignored all of them - plus every other point anyone else has raised, we're clearly not considering general community standards for operators, cluster practices, or anything, so lets flip this - what do you personally consider sufficient case here?

tnozicka commented 6 months ago

@steve-gray I am afraid that to the tone that you set here any point I make is irrelevant. Please understand that this is a software you don't pay for and your goals and priorities may not necessarily align with ours. It's perfectly fine to fork and develop the software to your liking with your own time invested, after all this is open source. But with all the respect, please try not to bash on other peoples issues.

steve-gray commented 6 months ago

@steve-gray I am afraid that to the tone that you set here any point I make is irrelevant.

Valid points are valid points, regardless of the origin. You've yet to raise a solid defence of this hard-coded namespace stuff though that I've seen, but I'm happy to be corrected?

Please understand that this is a software you don't pay for and your goals and priorities may not necessarily align with ours

Everyone is generally pretty respectful of the work Scylla does in the Open Source space, but that is not a wave-off for any criticism of technical decisions that have been made, especially one like this where people have repeatedly articulated clear reasons the decision that was made previously is problematic, and make adoption of the software difficult or problematic. If folks were doing something similarly non-idiomatic in the YUM packaging of the main project, I'm sure they'd have taken that on board, or at least been open to an improvement.

But with all the respect, please try not to bash on other peoples issues.

Am I to take that your view is that because you've dismissed this previously, without answering any of the points, that anyone who's raised it before is precluded from chiming in when another user raises the exact same problem and adds weight to it? Thats somewhat disingenuous a way to run an OSS project.

sourabhsarkar commented 6 months ago

@tnozicka @steve-gray let's discuss what needs to be done and the blockers to make this namespace scoped.

Nodeconfig is optional. And csi can be handled by dynamic provisioning lvm operators.

So does the operator even need access to cluster level k8s resources like pv and nodes?

steve-gray commented 6 months ago

@tnozicka @steve-gray let's discuss what needs to be done and the blockers to make this namespace scoped.

Nodeconfig is optional. And csi can be handled by dynamic provisioning lvm operators.

So does the operator even need access to cluster level k8s resources like pv and nodes?

No, it does not need anything of the sort. What this operator does, at its core, is schedule a statefulset - the statefulset has PVC's, which your provisioner for volumes will allocate PV's to fulfil. The nodes are selected by the Kubernetes scheduler, and this will handle taints and tolerations passed from the custom resource. The node profile you select doesn't even need to exist at scheduling time if you're using something like Karpenter to provision them.

The operator can do deployment scheduling without any intrinsic access to, awareness of, or dependency on the volume provisioner, the node topology or anything else physical about the cluster.

The work, is roughly:

Allow a namespace filter when processing custom resources so a given operator only watches namespaces its opted-in to (its a simple skip if-not-in-glob exercise in the iterator for apimachinery packages that do the reconcilation).
Remove all hardcoded references to "scylla-operator" namespace in the YAML (There are 11 of these), plus the one extra reference to the ".svc" suffixed form in the certificate.

There is no node-config specific element here to really do. At this point, you can have a custom namespace.

The residual work to make this multi-instance is to tease out the scylla-manager specific bits into their own thing (its another example of "there can be only one" in places), but that probably belongs on the Scylla-Manager repo for discussion. A V2 of this can come back and make the filtering occur in a way that doesn't require a global informer, since that's something of a sticking point I gather (though I've been unable to understand why).

Edit: Just want to call out that there is a little more to the operator than "just" deploying a statefulset with the cluster state introspection and upgrade controls, but these are essentially probing into that resource its provisioned at runtime. Just in case anyone takes this as a write-off of the some of the smarter stuff in the operator.

scylla-operator-bot[bot] commented 3 days ago

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

scylladb / scylla-operator