Closed iLem0n closed 2 years ago
You can deactivate CRD validation by setting quarkus.operator-sdk.crd.validate=false
in application.properties
. How is your ServiceAccount
defined? It seems that it's limited to the default
namespace, which means that it cannot access cluster-wide resources such as CRDs…
Using quarkus.operator-sdk.crd.validate=false
i got the following:
2021-10-01 21:19:27,561 WARN [io.fab.kub.cli.dsl.int.WatcherWebSocketListener] (OkHttp https://10.0.0.1/...) Exec Failure: HTTP 403, Status: 403 - flinksessions.de.ilem0n is forbidden: User "system:serviceaccount:default:flink-operator" cannot watch resource "flinksessions" in API group "de.ilem0n" at the cluster scope: java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
at okhttp3.internal.ws.RealWebSocket.checkUpgradeSuccess(RealWebSocket.java:224)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:195)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2021-10-01 21:19:27,567 ERROR [io.qua.run.Application] (main) Failed to start application (with profile prod): io.fabric8.kubernetes.client.KubernetesClientException: flinksessions.de.ilem0n is forbidden: User "system:serviceaccount:default:flink-operator" cannot watch resource "flinksessions" in API group "de.ilem0n" at the cluster scope
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:98)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:199)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Suppressed: java.lang.Throwable: waiting here
at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:151)
at io.fabric8.kubernetes.client.dsl.internal.WebSocketClientRunner.waitUntilReady(WebSocketClientRunner.java:50)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.waitUntilReady(AbstractWatchManager.java:164)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:822)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:795)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:86)
at io.javaoperatorsdk.operator.processing.event.internal.CustomResourceEventSource.start(CustomResourceEventSource.java:87)
at io.javaoperatorsdk.operator.processing.event.DefaultEventSourceManager.registerEventSource(DefaultEventSourceManager.java:85)
at io.javaoperatorsdk.operator.processing.event.DefaultEventSourceManager.<init>(DefaultEventSourceManager.java:49)
at io.javaoperatorsdk.operator.Operator.register(Operator.java:141)
at io.javaoperatorsdk.operator.Operator.register(Operator.java:83)
at io.quarkiverse.operatorsdk.runtime.OperatorProducer.operator(OperatorProducer.java:26)
at io.quarkiverse.operatorsdk.runtime.OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.create(OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.zig:346)
at io.quarkiverse.operatorsdk.runtime.OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.create(OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.zig:361)
at io.quarkus.arc.impl.AbstractSharedContext.createInstanceHandle(AbstractSharedContext.java:96)
at io.quarkus.arc.impl.AbstractSharedContext.access$000(AbstractSharedContext.java:14)
at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:29)
at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:26)
at io.quarkus.arc.impl.LazyValue.get(LazyValue.java:26)
at io.quarkus.arc.impl.ComputingCache.computeIfAbsent(ComputingCache.java:69)
at io.quarkus.arc.impl.AbstractSharedContext.get(AbstractSharedContext.java:26)
at io.quarkiverse.operatorsdk.runtime.OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.get(OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.zig:393)
at io.quarkiverse.operatorsdk.runtime.OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.get(OperatorProducer_ProducerMethod_operator_b4530a54321662e5391d4f625ba48900cce797ee_Bean.zig:409)
at ilem0n.de.FlinkOperator_Bean.create(FlinkOperator_Bean.zig:202)
at ilem0n.de.FlinkOperator_Bean.create(FlinkOperator_Bean.zig:242)
at io.quarkus.arc.impl.AbstractSharedContext.createInstanceHandle(AbstractSharedContext.java:96)
at io.quarkus.arc.impl.AbstractSharedContext.access$000(AbstractSharedContext.java:14)
at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:29)
at io.quarkus.arc.impl.AbstractSharedContext$1.get(AbstractSharedContext.java:26)
at io.quarkus.arc.impl.LazyValue.get(LazyValue.java:26)
at io.quarkus.arc.impl.ComputingCache.computeIfAbsent(ComputingCache.java:69)
at io.quarkus.arc.impl.AbstractSharedContext.get(AbstractSharedContext.java:26)
at io.quarkus.arc.impl.ClientProxies.getApplicationScopedDelegate(ClientProxies.java:17)
at ilem0n.de.FlinkOperator_ClientProxy.arc$delegate(FlinkOperator_ClientProxy.zig:67)
at ilem0n.de.FlinkOperator_ClientProxy.run(FlinkOperator_ClientProxy.zig:126)
at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:122)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:66)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:42)
at ilem0n.de.FlinkOperator.main(FlinkOperator.java:16)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:48)
at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:25)
2021-10-01 21:19:27,588 INFO [io.qua.it.ope.cli.run.OpenShiftClientProducer] (main) Closing OpenShift client
2021-10-01 21:19:27,600 INFO [io.quarkus] (main) flink stopped in 0.029s
There was nothing changed from the out of the box configuration.
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
/*... */
labels:
app.kubernetes.io/name: flink-operator
app.kubernetes.io/version: 0.0.4
name: flink-operator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: flink-operator-view
roleRef:
kind: ClusterRole
apiGroup: rbac.authorization.k8s.io
name: view
subjects:
- kind: ServiceAccount
name: flink-operator
---
The service account is bound the the ClusterRole view with the following rights.
Name: view
Labels: kubernetes.io/bootstrapping=rbac-defaults
rbac.authorization.k8s.io/aggregate-to-edit=true
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
bindings [] [] [get list watch]
configmaps [] [] [get list watch]
endpoints [] [] [get list watch]
events [] [] [get list watch]
limitranges [] [] [get list watch]
namespaces/status [] [] [get list watch]
namespaces [] [] [get list watch]
persistentvolumeclaims/status [] [] [get list watch]
persistentvolumeclaims [] [] [get list watch]
pods/log [] [] [get list watch]
pods/status [] [] [get list watch]
pods [] [] [get list watch]
replicationcontrollers/scale [] [] [get list watch]
replicationcontrollers/status [] [] [get list watch]
replicationcontrollers [] [] [get list watch]
resourcequotas/status [] [] [get list watch]
resourcequotas [] [] [get list watch]
serviceaccounts [] [] [get list watch]
services/status [] [] [get list watch]
services [] [] [get list watch]
controllerrevisions.apps [] [] [get list watch]
daemonsets.apps/status [] [] [get list watch]
daemonsets.apps [] [] [get list watch]
deployments.apps/scale [] [] [get list watch]
deployments.apps/status [] [] [get list watch]
deployments.apps [] [] [get list watch]
replicasets.apps/scale [] [] [get list watch]
replicasets.apps/status [] [] [get list watch]
replicasets.apps [] [] [get list watch]
verbs:
statefulsets.apps/scale [] [] [get list watch]
statefulsets.apps/status [] [] [get list watch]
statefulsets.apps [] [] [get list watch]
horizontalpodautoscalers.autoscaling/status [] [] [get list watch]
# Please edit the object below. Lines beginning with a '#' will be ignored,
horizontalpodautoscalers.autoscaling [] [] [get list watch]
cronjobs.batch/status [] [] [get list watch]
cronjobs.batch [] [] [get list watch]
jobs.batch/status [] [] [get list watch]
jobs.batch [] [] [get list watch]
daemonsets.extensions/status [] [] [get list watch]
daemonsets.extensions [] [] [get list watch]
deployments.extensions/scale [] [] [get list watch]
deployments.extensions/status [] [] [get list watch]
deployments.extensions [] [] [get list watch]
ingresses.extensions/status [] [] [get list watch]
ingresses.extensions [] [] [get list watch]
networkpolicies.extensions [] [] [get list watch]
replicasets.extensions/scale [] [] [get list watch]
replicasets.extensions/status [] [] [get list watch]
replicasets.extensions [] [] [get list watch]
replicationcontrollers.extensions/scale [] [] [get list watch]
ingresses.networking.k8s.io/status [] [] [get list watch]
ingresses.networking.k8s.io [] [] [get list watch]
networkpolicies.networking.k8s.io [] [] [get list watch]
poddisruptionbudgets.policy/status [] [] [get list watch]
poddisruptionbudgets.policy [] [] [get list watch]
since here was no rule for the customresourcedefinitions
i added the extra role.
But it seems that this is also not enough.
Should the extra ruleset from the flink-operator-extended
role mentioned above be enough?
Unfortunately its a corporate managed cluster and im not sure if there are any other restrictions which i cannot see.
How is your controller configured? Another thing, since you're running on an enterprise cluster your service account might not have rights at the cluster-level, which is what you're trying to do apparently… is your custom resource cluster-scoped?
Im not sure what exactly you mean with 'How is your controller configured?'.
I have created a minimal crashing example here. As you can see the CustomResource is namespaced.
What are the necessary rights on the cluster? As far as i see this should be:
customresourcedefinitions
cluster-widePS: are these flags like quarkus.operator-sdk.crd.validate=false
anywhere documented? I couldn't found any information about this.
Im not sure what exactly you mean with 'How is your controller configured?'.
There are several options you can select to configure different aspects of the controller, its name, which namespaces it watches, etc. Some of it is done via the @Controller
annotation, some via properties.
I have created a minimal crashing example here. As you can see the CustomResource is namespaced.
What are the necessary rights on the cluster? As far as i see this should be:
* get | watch | list: on the `customresourcedefinitions` cluster-wide
This shouldn't be needed if you deactivate the CR validation.
* get | watch | list | update | patch: on each CR (FlinkSession, FlinkJob) itself cluster-wide is there anything additional
Yes, that should be enough I think. Maybe you just don't have the permissions to give your operator these rights on your cluster?
PS: are these flags like
quarkus.operator-sdk.crd.validate=false
anywhere documented? I couldn't found any information about this.
Documentation is indeed a sore point we need to work on. 😢
I've faced this issue, too. In the case of using operator-sdk with helm, ansible or go, SDK generates RBAC manifest for service account automatically but this quarkus-operator-sdk doesn't. So it seems that we should assign clusterrole and clusterrolebinding manually.
Hi @teruz, you make a valid point but I don't think that's the issue here…
Indeed it would be marvelous if the extension could generate the RBAC out of the box in the target/kubernetes
directory together with the target deployment. For instance minikube.yml
.
What I did:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: my-operator
rules:
- apiGroups:
- com.example
resources:
- mycustomresource
verbs:
- "*"
- apiGroups:
- com.example
resources:
- mycustomresource/status
verbs:
- "*"
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- "get"
- "list"
# add here the core resources you need to manage
#- apiGroups:
# - ""
# resources:
# - secrets
# verbs:
# - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: my-operator
roleRef:
kind: ClusterRole
apiGroup: rbac.authorization.k8s.io
name: my-operator
subjects:
- kind: ServiceAccount
name: my-operator
minikube.yml
(.yaml
extension WON'T WORK!!)mvn clean install -Dquarkus.kubernetes.deployment-target=minikube -Dquarkus.kubernetes.namespace=default
. The namespace is important since the ClusterRoleBinding
needs to identify your SAEnjoy.
@metacosm I think we can maybe add this RBAC resources to the extension?
What are the necessary rights on the cluster?
Take a look at my comment above. Your SA must be tied to a ClusterRoleBinding
, other wise the access to the customresourcedefinitions
won't do.
This shouldn't be needed if you deactivate the CR validation.
I tried this property and it tries to validate the CR either way, hence requiring the customresourcedefinitions
permissions.
Hi @ricardozanini, that's the plan, yes, but indeed, any kubernetes.yml/openshift.yml/minikube.yml
fragment you put in src/main/resources
will get merged with what Quarkus generates.
@metacosm is there a way to have the same file replicated for both envs without having to clone them? In this example, I want minikube.yml
to also be applied to kubernetes and openshift targets. 🤔
I'm not sure but I think the proper way to do it is to use the kubernetes.yml
file, which, if I'm not mistaken should be applied to all flavors of cluster.
ClusterRole
and ClusterRoleBinding
needed to give your operator the permissions on your owned custom resources. Unfortunately, we cannot automatically add permissions for dependent/secondary resources but we're looking into options at the SDK level to improve the situation.This should be fixed in 2.0.3.
Hello,
i was developing an operator and tested it in my local docker desktop kubernetes environment. Everything runs fine. Then i wanted to run it on the productive environment where i run into the following error (listed below).
This can be reproduced on an azure kubernetes cluster and also on a local minikube cluster and there are also no problems when running the operator in the IDE (IntelliJ) with the azure cluster connected.
I also tested a fresh generated project with the following commands:
Error:
as i workarround i created an additional cluster role and bound it to the service account using:
after this i could see ``:
But the issue persist. What could cause this and how to fix it?