siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.45k stars 514 forks source link

Calico fails, read only file system #6902

Closed senare closed 1 year ago

senare commented 1 year ago

Feature Request

I am going with feature request since this might be me needing some documentation on how to install Calico ...

Might be something else also ofc !

Description

I try to install Talos with Calico CNI

talosctl -n 91.123.202.99 version

Client: Tag: v1.2.7 SHA: facc3d12 Built:
Go version: go1.19.2 OS/Arch: linux/amd64 Server: NODE: 91.123.202.99 Tag: v1.3.5 SHA: 03edf8c1 Built:
Go version: go1.19.6 OS/Arch: linux/amd64 Enabled: RBAC

and my patch for CNI

cluster:
  network:
    cni:
      name: custom
      urls:
        - https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/tigera-operator.yaml
        - https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/custom-resources.yaml

This fails unless i make operator namespace privileged

kubectl label --overwrite ns tigera-operator pod-security.kubernetes.io/enforce=privileged

namespace <calico-apiserver>

# k get all 
NAME                                   READY   STATUS    RESTARTS   AGE
pod/calico-apiserver-78667c9d6-tmtd5   1/1     Running   0          111m
pod/calico-apiserver-78667c9d6-wxvvw   1/1     Running   0          111m

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/calico-api   ClusterIP   10.108.85.187   <none>        443/TCP   111m

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/calico-apiserver   2/2     2            2           111m

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/calico-apiserver-78667c9d6   2         2         2       111m

namespace <calico-system>

# kubectl get all 
NAME                                           READY   STATUS                 RESTARTS       AGE
pod/calico-kube-controllers-6b7b9c649d-jf67z   1/1     Running                1 (109m ago)   111m
pod/calico-node-5z52j                          1/1     Running                0              111m
pod/calico-node-bs4r9                          1/1     Running                0              111m
pod/calico-node-gd667                          1/1     Running                1 (110m ago)   111m
pod/calico-node-gkgxj                          1/1     Running                1 (110m ago)   111m
pod/calico-node-hn5kj                          1/1     Running                0              111m
pod/calico-node-qmctj                          1/1     Running                1 (109m ago)   111m
pod/calico-typha-56958f9cd-djk6j               1/1     Running                0              111m
pod/calico-typha-56958f9cd-lgl2q               1/1     Running                0              111m
pod/calico-typha-56958f9cd-ztcwk               1/1     Running                0              111m
pod/csi-node-driver-blrbt                      1/2     CreateContainerError   0              111m
pod/csi-node-driver-dfkv8                      1/2     CreateContainerError   0              111m
pod/csi-node-driver-nqv7z                      1/2     CreateContainerError   0              111m
pod/csi-node-driver-s5jgf                      1/2     CreateContainerError   0              110m
pod/csi-node-driver-t5b4l                      1/2     CreateContainerError   1 (109m ago)   110m
pod/csi-node-driver-wgr2v                      1/2     CreateContainerError   0              109m

NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/calico-kube-controllers-metrics   ClusterIP   None            <none>        9094/TCP   109m
service/calico-typha                      ClusterIP   10.106.62.130   <none>        5473/TCP   111m

NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/calico-node       6         6         6       6            6           kubernetes.io/os=linux   111m
daemonset.apps/csi-node-driver   6         6         0       6            0           kubernetes.io/os=linux   111m

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/calico-kube-controllers   1/1     1            1           111m
deployment.apps/calico-typha              3/3     3            3           111m

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/calico-kube-controllers-6b7b9c649d   1         1         1       111m
replicaset.apps/calico-typha-56958f9cd               3         3         3       111m
senare commented 1 year ago

Is this related to this issue ?

https://github.com/siderolabs/talos/issues/6729

With read only filesystem ?

How do I debug it ?

How do I solve it ?

frezbo commented 1 year ago

not sure why you'd need the operator or why the operator needs elevated privileges, the canal quick install manifest works: https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/canal.yaml

senare commented 1 year ago
# k describe  pod/csi-node-driver-7rpcf 
Name:                 csi-node-driver-7rpcf
Namespace:            calico-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      default
Node:                 talos-worker-0/172.16.0.151
Start Time:           Thu, 02 Mar 2023 10:14:12 +0000
Labels:               app.kubernetes.io/name=csi-node-driver
                      controller-revision-hash=7c755b465f
                      k8s-app=csi-node-driver
                      name=csi-node-driver
                      pod-template-generation=1
Annotations:          cni.projectcalico.org/containerID: ead1ab107e90d1967e0f17371284719443de045b61b6a8804bf14e51f42b12d8
                      cni.projectcalico.org/podIP: 192.168.185.133/32
                      cni.projectcalico.org/podIPs: 192.168.185.133/32
Status:               Pending
IP:                   192.168.185.133
IPs:
  IP:           192.168.185.133
Controlled By:  DaemonSet/csi-node-driver
Containers:
  calico-csi:
    Container ID:  
    Image:         docker.io/calico/csi:v3.25.0
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      --nodeid=$(KUBE_NODE_NAME)
      --loglevel=$(LOG_LEVEL)
    State:          Waiting
      Reason:       CreateContainerError
    Ready:          False
    Restart Count:  0
    Environment:
      LOG_LEVEL:       warn
      KUBE_NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /csi from socket-dir (rw)
      /etc/calico from etccalico (rw)
      /var/lib/kubelet from kubelet-dir (rw)
      /var/run from varrun (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-l7zb4 (ro)
  csi-node-driver-registrar:
    Container ID:  containerd://4b26963c5c7a6b3aa59a54f4d398bd349df9f6e649b88481bf1e15de5d6a8a46
    Image:         docker.io/calico/node-driver-registrar:v3.25.0
    Image ID:      docker.io/calico/node-driver-registrar@sha256:f559ee53078266d2126732303f588b9d4266607088e457ea04286f31727676f7
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Running
      Started:      Thu, 02 Mar 2023 10:14:13 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:               /csi/csi.sock
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/csi.tigera.io/csi.sock
      KUBE_NODE_NAME:         (v1:spec.nodeName)
    Mounts:
      /csi from socket-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-l7zb4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  varrun:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run
    HostPathType:  
  etccalico:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/calico
    HostPathType:  
  kubelet-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:  Directory
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi.tigera.io
    HostPathType:  DirectoryOrCreate
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry
    HostPathType:  Directory
  kube-api-access-l7zb4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  21s               default-scheduler  Successfully assigned calico-system/csi-node-driver-7rpcf to talos-worker-0
  Warning  Failed     21s               kubelet            Error: failed to generate container "f541f71a86aa00c5c120ead1cb85c4dba81897c857f49af3113c0fb6d81e8bb7" spec: failed to generate spec: failed to mkdir "/etc/calico": mkdir /etc/calico: read-only file system
  Normal   Pulled     21s               kubelet            Container image "docker.io/calico/node-driver-registrar:v3.25.0" already present on machine
  Normal   Created    21s               kubelet            Created container csi-node-driver-registrar
  Normal   Started    21s               kubelet            Started container csi-node-driver-registrar
  Warning  Failed     21s               kubelet            Error: failed to generate container "0212996e0519e17d48ccc2cf1ebfb93882bfea3de38a7862e63d73a61892f585" spec: failed to generate spec: failed to mkdir "/etc/calico": mkdir /etc/calico: read-only file system
  Warning  Failed     20s               kubelet            Error: failed to generate container "1cb749ea4f998f782d23985e92b18b0192f6059c6517655db8d362d9cbb1a799" spec: failed to generate spec: failed to mkdir "/etc/calico": mkdir /etc/calico: read-only file system
  Normal   Pulled     7s (x4 over 21s)  kubelet            Container image "docker.io/calico/csi:v3.25.0" already present on machine
  Warning  Failed     7s                kubelet            Error: failed to generate container "4d364cd6fc36545368b5df94b65cccdb6f698f66897c71b75b0571a10e5b5271" spec: failed to generate spec: failed to mkdir "/etc/calico": mkdir /etc/calico: read-only file system
senare commented 1 year ago

not sure why you'd need the operator or why the operator needs elevated privileges, the canal quick install manifest works: https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/canal.yaml

I will try that instead then i guess !

And I really don't know either ... are there any good source of information for 'networking' and Talos ? I am having problems not so much with talos but with making it work in environment ...

senare commented 1 year ago

I am running on an Openstack and my limited understanding is that one possible reason I am having problems is the CNI i.e default Flannel .. which is based on UDP tunnels (?) .. so I want to move to IPIP (cilium,Calico) ... but if I install Cilium with canal i am still using Flannel or i.e UDP tunnels ?

uhthomas commented 1 year ago

https://github.com/tigera/operator/issues/2444

uhthomas commented 1 year ago

This issue can be closed now.