tigera / operator

Kubernetes operator for installing Calico and Calico Enterprise
Apache License 2.0
187 stars 141 forks source link

[SOLVED] Issue migrating to Tigera Operator, IPAMCONFIGURATION not found #3276

Open amapi opened 8 months ago

amapi commented 8 months ago

We decide to migrate from calico manifest installation to Tigera Calico Operator on K8S cluster with Windows nodes.

Everything work (calico run fine) on Linux node

Nothing work on Windows node

No calico-node-win (pod/ds) are created.

Tigera Operator pod report this error message

"Waiting for IPAMConfiguration watch to be established"

Cluster before migration: K8S 1.27.6 Calico 3.27.2 (installed with manifest)

Calico run fine on linux and windows node

Cluster after calico tigera operator migration: K8S 1.27.6 Calico 3.27.2 ok on linux node Nothing on windows node

root@kwinlabmasaz0a01:~# kubectl get pod -A -o wide | grep calico
calico-system           calico-kube-controllers-8659d8b4b4-mwh2c                   1/1     Running                      0             47m     172.12.26.7      kwinlabmasaz0b01   <none>           <none>
calico-system           calico-node-2ksvj                                          1/1     Running                      0             38m     10.235.76.206    kwinlabingaz0a01   <none>           <none>
calico-system           calico-node-82hll                                          1/1     Running                      0             41m     10.235.76.218    kwinlabwrkaz0a02   <none>           <none>
calico-system           calico-node-9mjds                                          1/1     Running                      0             39m     10.235.76.208    kwinlabwrkaz0a01   <none>           <none>
calico-system           calico-node-t5lx8                                          1/1     Running                      0             40m     10.235.76.209    kwinlabmasaz0b01   <none>           <none>
calico-system           calico-node-tnc4f                                          1/1     Running                      0             39m     10.235.76.204    kwinlabwrkaz0a03   <none>           <none>
calico-system           calico-node-tnzw6                                          1/1     Running                      0             41m     10.235.76.199    kwinlabwrkaz0a04   <none>           <none>
calico-system           calico-node-vgx2g                                          1/1     Running                      0             40m     10.235.76.217    kwinlabmasaz0c01   <none>           <none>
calico-system           calico-node-vp2wn                                          1/1     Running                      0             38m     10.235.76.214    kwinlabmasaz0a01   <none>           <none>
calico-system           calico-typha-5bdbbcfb6b-6lsh2                              1/1     Running                      0             41m     10.235.76.204    kwinlabwrkaz0a03   <none>           <none>
calico-system           calico-typha-5bdbbcfb6b-c7mpp                              1/1     Running                      0             41m     10.235.76.208    kwinlabwrkaz0a01   <none>           <none>
calico-system           calico-typha-5bdbbcfb6b-wxknk                              1/1     Running                      0             41m     10.235.76.214    kwinlabmasaz0a01   <none>           <none>
calico-system           csi-node-driver-4gp2j                                      2/2     Running                      0             47m     172.12.255.200   kwinlabingaz0a01   <none>           <none>
calico-system           csi-node-driver-68kv9                                      2/2     Running                      0             47m     172.12.75.176    kwinlabwrkaz0a04   <none>           <none>
calico-system           csi-node-driver-bcc5b                                      2/2     Running                      0             47m     172.12.26.6      kwinlabmasaz0b01   <none>           <none>
calico-system           csi-node-driver-fqgk5                                      2/2     Running                      0             47m     172.12.226.196   kwinlabmasaz0c01   <none>           <none>
calico-system           csi-node-driver-kv47h                                      2/2     Running                      0             47m     172.12.106.130   kwinlabwrkaz0a03   <none>           <none>
calico-system           csi-node-driver-ppsff                                      2/2     Running                      0             47m     172.12.7.135     kwinlabmasaz0a01   <none>           <none>
calico-system           csi-node-driver-qlltc                                      2/2     Running                      0             47m     172.12.173.171   kwinlabwrkaz0a01   <none>           <none>
calico-system           csi-node-driver-wthmr                                      2/2     Running                      0             47m     172.12.204.244   kwinlabwrkaz0a02   <none>           <none>
root@kwinlabmasaz0a01:~# kubectl  get node -o wide
NAME               STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION      CONTAINER-RUNTIME
kwinlabaz0a01      Ready    <none>                 9h    v1.25.6   10.235.76.216   <none>        Windows Server 2022 Standard   10.0.20348.1906     containerd://1.6.1
kwinlabingaz0a01   Ready    <none>                 11h   v1.25.6   10.235.76.206   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
kwinlabmasaz0a01   Ready    control-plane,master   11h   v1.25.6   10.235.76.214   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
kwinlabmasaz0b01   Ready    control-plane,master   11h   v1.25.6   10.235.76.209   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
kwinlabmasaz0c01   Ready    control-plane,master   11h   v1.25.6   10.235.76.217   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
kwinlabwrkaz0a01   Ready    <none>                 10h   v1.25.6   10.235.76.208   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
kwinlabwrkaz0a02   Ready    <none>                 10h   v1.25.6   10.235.76.218   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
kwinlabwrkaz0a03   Ready    <none>                 10h   v1.25.6   10.235.76.204   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
kwinlabwrkaz0a04   Ready    <none>                 10h   v1.25.6   10.235.76.199   <none>        Ubuntu 22.04.1 LTS             5.15.0-58-generic   containerd://1.7.5
root@kwinlabmasaz0a01:~# kubectl logs -f tigera-operator-7c67d76845-rrvsj -n tigera-operator

{"level":"error","ts":"2024-03-28T20:18:35Z","logger":"controller_windows","msg":"Waiting for IPAMConfiguration watch to be established","reason":"ResourceNotReady","stacktrace":"github.com/tigera/operator/pkg/controller/status.(*statusManager).SetDegraded\n\t/go/src/github.com/tigera/operator/pkg/controller/status/status.go:356\ngithub.com/tigera/operator/pkg/controller/installation.(*ReconcileWindows).Reconcile\n\t/go/src/github.com/tigera/operator/pkg/controller/installation/windows_controller.go:328\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"}
tmjd commented 8 months ago

Please provide more info. How did you migrate? What directions did you follow? Do you have the IPAMConfiguration resource type in your cluster? Have you reviewed the tigerastatus? (check tigerastatus you can also add -o yaml to the kubectl command to get more info)

Please include any additional info you find.

I would expect that the IPAMConfiguration is the problem here.

I'm surprised your old windows pods aren't still present on the windows nodes, but I'm not too familiar with how the migration works with windows. @coutinhop does it make sense that the old windows pods aren't present anymore even though the windows controller isn't running?

amapi commented 8 months ago

Hello, these are the answers.

A. What directions did you follow?

Tigera migration procedure on calico website. I have applied the manifest (with my custom parameters, like api server address / port , got cidr, etc ...). All is working fine for linux node. But nothing appear for windows node (the windows calico node daemonset do not spaw at all. Nothing about this ds in events logs)

B. Do you have the IPAMConfiguration resource type in your cluster?

i do not have IPAMConfiguration object but i have IPAMConfig. Dont know if it is the same your talk about.

root@kwinlabmasaz0a01:~/calico/base/tigera-operator# k get ipamconfig
NAME      AGE
default   28h
root@kwinlabmasaz0a01:~/calico/base/tigera-operator# k get ipamconfig default -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPAMConfig
metadata:
  annotations:
    projectcalico.org/metadata: '{"creationTimestamp":null}'
  creationTimestamp: "2024-03-28T08:40:04Z"
  generation: 2
  name: default
  resourceVersion: "462"
  uid: 87108bbe-599a-4f21-a3c6-3aa74c073ff5
spec:
  autoAllocateBlocks: true
  strictAffinity: true
root@kwinlabmasaz0a01:~/calico/base/tigera-operator# k get IPAMConfiguration
error: the server doesn't have a resource type "IPAMConfiguration"

C. Have you reviewed the tigerastatus?

root@kwinlabmasaz0a01:~/calico/base/tigera-operator# k get tigerastatus
NAME             AVAILABLE   PROGRESSING   DEGRADED   SINCE
calico           True        False         False      165m
calico-windows                             True
root@kwinlabmasaz0a01:~/calico/base/tigera-operator# k get tigerastatus calico-windows -o yaml
apiVersion: operator.tigera.io/v1
kind: TigeraStatus
metadata:
  creationTimestamp: "2024-03-29T08:45:58Z"
  generation: 1
  name: calico-windows
  resourceVersion: "457556"
  uid: 2fb18a88-d71c-4fd5-a30d-060c17338f5f
spec: {}
status:
  conditions:
  - lastTransitionTime: "2024-03-29T08:46:03Z"
    message: 'Waiting for IPAMConfiguration watch to be established: '
    reason: ResourceNotReady
    status: "True"
    type: Degraded

Other informations

root@kwinlabmasaz0a01:~# k get ds -A
NAMESPACE              NAME                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                            AGE
calico-system          calico-node                              8         8         8       8            8           kubernetes.io/os=linux                   5h8m
calico-system          csi-node-driver                          8         8         8       8            8           kubernetes.io/os=linux                   5h8m
fluent                 fluent-bit                               8         8         8       8            8           kubernetes.io/os=linux                   26h
fluent                 fluent-bit-windows2022                   1         1         0       0            0           kubernetes.io/os=windows                 26h
ingress-nginx          ingress-nginx-controller                 1         1         1       1            1           kubernetes.io/os=linux,role=ingress      27h
kube-ingress-traefik   traefik                                  1         1         1       1            1           role=ingress                             27h
kube-mon               kube-mon                                 9         9         8       9            8           <none>                                   26h
kube-system            csi-cinder-nodeplugin                    8         8         8       8            8           kubernetes.io/os=linux                   28h
kube-system            openstack-cloud-controller-manager       3         3         3       3            3           node-role.kubernetes.io/control-plane=   28h
loki-canary            loki-canary                              0         0         0       0            0           loki-canary=true                         26h
monitoring             prometheus-prometheus-node-exporter      8         8         8       8            8           kubernetes.io/os=linux                   27h
monitoring             prometheus-prometheus-windows-exporter   1         1         1       1            1           kubernetes.io/os=windows                 27h
coutinhop commented 8 months ago

@amapi what steps did you follow to migrate to windows operator? Was it these https://docs.tigera.io/calico/latest/getting-started/kubernetes/windows-calico/operator ? (Meaning: have you already added the required configuration for windows and enabled it in the installation resource?)

Can you post more complete operator logs, and the output of kubectl get tigerastatus-o yaml as @tmjd suggested? We have to figure out why the operator could not establish that ipamconfiguration watch...

amapi commented 8 months ago

(Meaning: have you already added the required configuration for windows and enabled it in the installation resource?)

Calico is already installed on windows node, installation is done following calico documentation (can detail here but i will just copy/paste calico website)

For the calico operator, Update to K8s 1.27.6 Update Calico to 3.27.2 with manifest than i apply tigera-operator.yaml (following https://docs.tigera.io/calico/latest/operations/operator-migration#operator-migration)

I had the installation object

root@kwinlabmasaz0a01:~# k get installation
NAME      AGE
default   5h14m
root@kwinlabmasaz0a01:~# k get installation default -o yaml
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  creationTimestamp: "2024-03-29T08:30:59Z"
  finalizers:
  - tigera.io/operator-cleanup
  generation: 10
  name: default
  resourceVersion: "610882"
  uid: 79ca9d60-05ca-4f86-af80-69c600ce3881
spec:
  calicoNetwork:
    bgp: Enabled
    hostPorts: Enabled
    ipPools:
    - blockSize: 26
      cidr: 172.12.0.0/16
      disableBGPExport: false
      encapsulation: None
      natOutgoing: Enabled
      nodeSelector: all()
    linuxDataplane: Iptables
    mtu: 0
    multiInterfaceMode: None
    nodeAddressAutodetectionV4:
      firstFound: true
    windowsDataplane: HNS
  cni:
    ipam:
      type: Calico
    type: Calico
  componentResources:
  - componentName: Node
    resourceRequirements:
      requests:
        cpu: 250m
  controlPlaneReplicas: 2
  flexVolumePath: None
  kubeletVolumePluginPath: /var/lib/kubelet
  logging:
    cni:
      logFileMaxAgeDays: 30
      logFileMaxCount: 10
      logFileMaxSize: 100Mi
      logSeverity: Info
  nodeUpdateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
  nonPrivileged: Disabled
  registry: docker-proxy-asis-coreos.popob1.repos.tech.orange/
  serviceCIDRs:
  - 10.254.0.0/16
  variant: Calico
  windowsNodes:
    cniBinDir: /opt/cni/bin
    cniConfigDir: /etc/cni/net.d
    cniLogDir: /var/log/calico/cni
status:
  calicoVersion: v3.27.2
  computed:
    calicoNetwork:
      bgp: Enabled
      hostPorts: Enabled
      ipPools:
      - blockSize: 26
        cidr: 172.12.0.0/16
        disableBGPExport: false
        encapsulation: None
        natOutgoing: Enabled
        nodeSelector: all()
      linuxDataplane: Iptables
      mtu: 0
      multiInterfaceMode: None
      nodeAddressAutodetectionV4:
        firstFound: true
      windowsDataplane: HNS
    cni:
      ipam:
        type: Calico
      type: Calico
    componentResources:
    - componentName: Node
      resourceRequirements:
        requests:
          cpu: 250m
    controlPlaneReplicas: 2
    flexVolumePath: None
    kubeletVolumePluginPath: /var/lib/kubelet
    logging:
      cni:
        logFileMaxAgeDays: 30
        logFileMaxCount: 10
        logFileMaxSize: 100Mi
        logSeverity: Info
    nodeUpdateStrategy:
      rollingUpdate:
        maxSurge: 0
        maxUnavailable: 1
      type: RollingUpdate
    nonPrivileged: Disabled
    registry: docker-proxy-asis-coreos.popob1.repos.tech.orange/
    serviceCIDRs:
    - 10.254.0.0/16
    variant: Calico
    windowsNodes:
      cniBinDir: /opt/cni/bin
      cniConfigDir: /etc/cni/net.d
      cniLogDir: /var/log/calico/cni
  conditions:
  - lastTransitionTime: "2024-03-29T10:51:51Z"
    message: All Objects Available
    observedGeneration: 10
    reason: AllObjectsAvailable
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-03-29T10:51:51Z"
    message: All objects available
    observedGeneration: 10
    reason: AllObjectsAvailable
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-03-29T10:51:51Z"
    message: All Objects Available
    observedGeneration: 10
    reason: AllObjectsAvailable
    status: "False"
    type: Progressing
  variant: Calico
amapi commented 8 months ago

Have edited previous message and added new response

tmjd commented 8 months ago

I think you need to deploy the APIServer. I think kubectl create APIServer default will probably do the trick. If not you can create a file with

apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}

and then kubectl apply that file

amapi commented 8 months ago

i'm testing, will report when done

amapi commented 8 months ago

This is IT. Thanks a lot.

Solution: Apply

apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}