linsoss / doris-operator

An operator for Apache Doris that manages Doris cluster and observability components through Kubernetes CRs 😆
https://linsoss.github.io/doris-operator/
Apache License 2.0
12 stars 4 forks source link

Helm: Failed to choose a container for doris-operator-controller-manager. #14

Closed Phoenix500526 closed 10 months ago

Phoenix500526 commented 1 year ago

Following the steps in the document, I installed doris-operator using helm. During the startup process of doris-operator, the status of doris-operator-controller-manager-f5cb98fd5-9kw2d remains in CrashLoopBackOff."

$ kubectl get pods
NAME                                                READY   STATUS             RESTARTS   AGE
doris-operator-controller-manager-f5cb98fd5-9kw2d   1/2     CrashLoopBackOff   32         3h43m

By using kubectl logs doris-operator-controller-manager-f5cb98fd5-9kw2d, you can view the logs of the pod, as shown below:

Error from server (BadRequest): a container name must be specified for pod doris-operator-controller-manager-f5cb98fd5-9kw2d, choose one of: [kube-rbac-proxy manager]

It seems we should specify the container name for doris-operator-controller-manager.

My environment is shown below:

Distribution: Ubuntu 22.04 Kernel: Linux master 5.15.0-87-generic Kubenetes: 1.16.15 helm: v3.13.1

Phoenix500526 commented 1 year ago

The default container is specified in file. But for some unknown reasons, this annotation doesn't work.

Al-assad commented 1 year ago

@Phoenix500526 Hi Phoenix,

Error from server (BadRequest): a container name must be specified for pod doris-operator-controller-manager-f5cb98fd5-9kw2d, choose one of: [kube-rbac-proxy manager] 

That's not an error in the doris-operator-controller-manager logs that's kubectl telling you you need to pass it the container that you want to see logs for. Try:

kubectl logs -f doris-operator-controller-manager-f5cb98fd5-9kw2d  -c kube-rbac-proxy

Or:

kubectl logs -f doris-operator-controller-manager-f5cb98fd5-9kw2d  -c manager
Al-assad commented 1 year ago

The default container is specified in file. But for some unknown reasons, this annotation doesn't work.

AFAIK the kubectl.kubernetes.io/default-logs-container annotation will only take effect after kubernetes 1.18

Phoenix500526 commented 1 year ago

OMG, my mistake. I'm new to k8s and operator. Here is the error log:

$ kubectl logs -f doris-operator-controller-manager-f5cb98fd5-9kw2d  -c manager
kubectl logs -f  doris-operator-controller-manager-f5cb98fd5-9kw2d -c manager
2023-10-30T14:25:46Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2023-10-30T14:25:46Z    INFO    setup   starting manager                                                            
2023-10-30T14:25:46Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}                                                                                                                        
I1030 14:25:46.610571       1 leaderelection.go:245] attempting to acquire leader lease default/0a2dfd6b.al-assad.github.io...
2023-10-30T14:25:46Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}                                                                                                                                            
I1030 14:26:02.695739       1 leaderelection.go:255] successfully acquired lease default/0a2dfd6b.al-assad.github.io                                                                                                                     
2023-10-30T14:26:02Z    DEBUG   events  doris-operator-controller-manager-f5cb98fd5-9kw2d_a051a7c6-099e-40e8-ae0e-7f9e53eb6b34 became leader    {"type": "Normal", "object": {"kind":"Lease","namespace":"default","name":"0a2dfd6b.al-as
sad.github.io","uid":"21147764-4dfe-4803-a84c-239a4ad287bb","apiVersion":"coordination.k8s.io/v1","resourceVersion":"173121"}, "reason": "LeaderElection"}                                                                               
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "doriscluster", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisCluster", "source": "kind source: *v1beta1.DorisCluster"}                        
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "doriscluster", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisCluster", "source": "kind source: *v1.StatefulSet"}                              
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "dorismonitor", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisMonitor", "source": "kind source: *v1beta1.DorisMonitor"}
2023-10-30T14:26:02Z    INFO    Starting Controller     {"controller": "doriscluster", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisCluster"}
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "dorismonitor", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisMonitor", "source": "kind source: *v1.DaemonSet"}
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "dorismonitor", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisMonitor", "source": "kind source: *v1.Deployment"}                               
2023-10-30T14:26:02Z    INFO    Starting Controller     {"controller": "dorismonitor", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisMonitor"}
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "dorisautoscaler", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisAutoscaler", "source": "kind source: *v1beta1.DorisAutoscaler"}
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "dorisautoscaler", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisAutoscaler", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2023-10-30T14:26:02Z    INFO    Starting Controller     {"controller": "dorisautoscaler", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisAutoscaler"}
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "dorisinitializer", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisInitializer", "source": "kind source: *v1beta1.DorisInitializer"}            
2023-10-30T14:26:02Z    INFO    Starting EventSource    {"controller": "dorisinitializer", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisInitializer", "source": "kind source: *v1.Job"}
2023-10-30T14:26:02Z    INFO    Starting Controller     {"controller": "dorisinitializer", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisInitializer"}
2023-10-30T14:26:02Z    ERROR   controller-runtime.source.EventHandler  failed to get informer from cache       {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: autoscaling/v2: the se
rver could not find the requested resource"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:68
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/loop.go:62
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
        /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/loop.go:63
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
        /go/pkg/mod/k8s.io/apimachinery@v0.27.2/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/source/kind.go:56
2023-10-30T14:26:02Z    INFO    Starting workers        {"controller": "dorismonitor", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisMonitor", "worker count": 1}
2023-10-30T14:26:02Z    INFO    Starting workers        {"controller": "dorisinitializer", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisInitializer", "worker count": 1}
2023-10-30T14:26:02Z    INFO    Starting workers        {"controller": "doriscluster", "controllerGroup": "al-assad.github.io", "controllerKind": "DorisCluster", "worker count": 1}
2023-10-30T14:26:12Z    ERROR   controller-runtime.source.EventHandler  failed to get informer from cache       {"error": "failed to get API group resources: unable to retrieve the complete list of server APIs: autoscaling/v2: the se
rver could not find the requested resource"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1
......
Al-assad commented 1 year ago

I see! DorisAutoscalerController will listen the kubernetes hpa v2 resource, but the kubernetes autoscaling/v2 API is only available after kubernetes 1.22, and you're using kubernetes 1.16, so you'll get this error.

🤔 Maybe we should add a judgment on the current Kubernetes version when launching the Operator Controller. If the version is lower than 1.22, cancel listening to hpa v2 resources.

Al-assad commented 1 year ago

Of course you can deploy the operator using Kubernetes 1.22 to avoid this issue. I will probably fix this problem by this weekend.

Phoenix500526 commented 1 year ago

Maybe I can pull a request to solve it. @Al-assad

Al-assad commented 1 year ago

Maybe I can pull a request to solve it. @Al-assad

Wow, thanks very much. Looking forward to your works :)