streamnative / function-mesh

The serverless framework purpose-built for event streaming applications.
https://functionmesh.io/
Apache License 2.0
210 stars 27 forks source link

[Bug] - function-mesh-controller-manager-service keeps crashing #666

Closed david-streamlio closed 1 year ago

david-streamlio commented 1 year ago

Environment:

OS: Ubuntu 22.04 K8s: 1.25.11 Kubernetes Distro: microk8s Function Mesh: v0.14.0

Behavior

The function-mesh-controller-manager-service keeps crashing and getting reassigned a new IP address

david@kubernetes:~/sn-function-mesh-operator$ for i in {1..5}; do kubectl get svc -n operators | grep function-mesh-controller; sleep 10; done
function-mesh-controller-manager-service                 ClusterIP   10.152.183.20    <none>        443/TCP    7s
function-mesh-controller-manager-service                 ClusterIP   10.152.183.224   <none>        443/TCP    5s
function-mesh-controller-manager-service                 ClusterIP   10.152.183.224   <none>        443/TCP    15s
function-mesh-controller-manager-service                 ClusterIP   10.152.183.32    <none>        443/TCP    7s
function-mesh-controller-manager-service                 ClusterIP   10.152.183.93    <none>        443/TCP    1s

Steps to reproduce

Install the Function Mesh Operator using the documentation starting with the cert-manager, i.e., helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.8.0 --set installCRDs=true

Then install the function mesh operator with helm, i.e., helm install function-mesh function-mesh/function-mesh-operator --namespace operators --set installCRDs=true

Check the IP of the function-mesh-controller-manager-service, and observe it changing constantly.

Impact

This makes it impossible to deploy anything to the function-mesh using kubectl apply because the IP address keeps changing. If you run the command kubectl apply -f compute_v1alpha1_functionmesh.yaml -n operators the pods will never get created and you can see the following error in the logs of the function-mesh-controller-manager- pod

1.689277604928055e+09   ERROR   controllers.FunctionMesh    failed to handle function   {"name": "ex1", "action": "Create", "error": "Internal error occurred: failed calling webhook \"mfunction.kb.io\": failed to call webhook: Post \"https://function-mesh-controller-manager-service.operators.svc:443/mutate-compute-functionmesh-io-v1alpha1-function?timeout=10s\": dial tcp 10.152.183.156:443: connect: connection refused"}
github.com/streamnative/function-mesh/controllers.(*FunctionMeshReconciler).Reconcile
    github.com/streamnative/function-mesh/controllers/functionmesh_controller.go:96
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    sigs.k8s.io/controller-runtime@v0.12.3/pkg/internal/controller/controller.go:234
nlu90 commented 1 year ago

could you check if your k8s cluster has the following:

➜  samples git:(branch-0.14) ✗ kubectl get Validatingwebhookconfigurations -A
NAME                                             WEBHOOKS   AGE
function-mesh-validating-webhook-configuration   3          12m
➜  samples git:(branch-0.14) ✗ kubectl get Mutatingwebhookconfigurations -A
NAME                                           WEBHOOKS   AGE
function-mesh-mutating-webhook-configuration   3          12m

Also, if your deployment has webhook env set to true:

➜  samples git:(branch-0.14) ✗ kubectl get Deployments function-mesh-controller-manager -n function-mesh -oyaml
...
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: ENABLE_WEBHOOKS
          value: "true"
david-streamlio commented 1 year ago
david@kubernetes:~/sn-function-mesh-operator$ kubectl get Validatingwebhookconfigurations -A | grep function
vfunction.kb.io-4gzcc                                  1          5h24m
function-mesh-validating-webhook-configuration         3          124m
david-streamlio commented 1 year ago
david@kubernetes:~/sn-function-mesh-operator$ kubectl get Mutatingwebhookconfigurations -A | grep function
mfunction.kb.io-57gds                                1          5h25m
function-mesh-mutating-webhook-configuration         3          125m
david-streamlio commented 1 year ago
 kubectl get Deployments function-mesh-controller-manager -n operators -oyaml
...
 env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: ENABLE_WEBHOOKS
          value: "true"
david-streamlio commented 1 year ago

Attaching the full log of thefunction-mesh-controller-manager- pod operator-log.txt

nlu90 commented 1 year ago

We debugged offline, there's a conflicting FM controller installed by OLM hijacking the request. After the OLM installed operator is cleaned, the operator work as normal.