k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.05k stars 2.35k forks source link

k3s 0.5 agent node fail to create nginx pod when starting as systemd service #478

Closed vincentmli closed 5 years ago

vincentmli commented 5 years ago

Describe the bug A clear and concise description of what the bug is.

running k3s agent node as systemd service as below cause agent node fail to create pod

start k3s agent service as:

root@home-ubuntu:/var/lib/rancher# cat /etc/systemd/system/k3s.service 
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target

[Service]
Type=notify
ExecStart=/usr/local/bin/k3s --debug agent -s https://192.168.1.30:6443 -t "K102fcec9c1bd7ecdeb16a471571d2fb3abe6d0a3d49cb69b029d0264ea78a71e3c::node:099ebc33d9626443a5290c3cc146602a"
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity

[Install]
WantedBy=multi-user.target

deploy nginx pod and service : kubectl apply -f nginx_cluster_pod_service.yaml

root@Office-R220-vli:/home/vincent# cat nginx_cluster_pod_service.yaml 
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

---

apiVersion: v1
kind: Service
metadata:
  labels:
    name: nginxservice
  name: nginxservice
spec:
  ports:
    # The port that this service should serve on.
    - port: 80
  selector:
    app: nginx
  type: ClusterIP

nginx pod in agent node stuck in ContainerCreating


root@Office-R220-vli:/home/vincent# kubectl get po -o wide
NAME          READY   STATUS              RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATES
nginx-jn2mw   0/1     ContainerCreating   0          10m          home-ubuntu                  
nginx-z7b99   1/1     Running             0          10m   10.42.0.19   office-r220-vli              
root@Office-R220-vli:/home/vincent# kubectl get no -o wide
NAME              STATUS   ROLES    AGE   VERSION         INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
home-ubuntu       Ready       11m   v1.14.1-k3s.4   192.168.1.31           Ubuntu 18.04.1 LTS   4.15.0-47-generic   containerd://1.2.5+unknown
office-r220-vli   Ready       13m   v1.14.1-k3s.4   192.168.1.30           Ubuntu 16.04.1 LTS   4.4.0-31-generic    containerd://1.2.5+unknown

running k3s agent node from command line as below, nginx pod is able to be created

root@home-ubuntu:/var/lib/rancher# /usr/local/bin/k3s --debug agent -s https://192.168.1.30:6443 -t "K102fcec9c1bd7ecdeb16a471571d2fb3abe6d0a3d49cb69b029d0264ea78a71e3c::node:099ebc33d9626443a5290c3cc146602a"

root@Office-R220-vli:/home/vincent# kubectl apply -f nginx_cluster_pod_service.yaml 
replicationcontroller/nginx created
service/nginxservice created

root@Office-R220-vli:/home/vincent# kubectl get po -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATES
nginx-72ffz   1/1     Running   0          11s   10.42.1.16   home-ubuntu                  
nginx-8jx87   1/1     Running   0          11s   10.42.0.20   office-r220-vli              

To Reproduce Steps to reproduce the behavior:

download k3s 0.5 and run k3s agent node from systemd service as above example

Expected behavior A clear and concise description of what you expected to happen.

k3s agent node should be able to run from systemd service and able to create pod

Screenshots If applicable, add screenshots to help explain your problem.

k3s agent node startup logs from systemd service

May 14 20:30:51 home-ubuntu systemd[1]: Starting Lightweight Kubernetes...
May 14 20:30:51 home-ubuntu k3s[11731]: time="2019-05-14T20:30:51Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/4e1224c66a9dbb9b03daefff200f4f8eaf45590fb722b6fe2924a201d6de2e8d"
May 14 20:30:52 home-ubuntu k3s[11731]: time="2019-05-14T20:30:52Z" level=debug msg="Running /var/lib/rancher/k3s/data/4e1224c66a9dbb9b03daefff200f4f8eaf45590fb722b6fe2924a201d6de2e8d/bin/k3s-agent [/usr/local/bin/k3s --debug agent -s https://192.168.1.30:6443 -t K102fcec9c1bd7ecdeb16a471571d2fb3abe6d0a3d49cb69b029d0264ea78a71e3c::node:099ebc33d9626443a5290c3cc146602a]"
May 14 20:30:52 home-ubuntu k3s[11731]: time="2019-05-14T20:30:52.434902781Z" level=info msg="Starting k3s agent v0.5.0 (8c0116dd)"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.215686449Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.220793382Z" level=info msg="Waiting for containerd startup: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory\""
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.255414211Z" level=info msg="starting containerd" revision= version=1.2.5+unknown
.............CUT.......
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.275667026Z" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntime:{Type:io.containerd.runtime.v1.linux Engine: Root: Options:} UntrustedWorkloadRuntime:{Type: Engine: Root: Options:} Runtimes:map[] NoPivot:false} CniConfig:{NetworkPluginBinDir:/var/lib/rancher/k3s/data/4e1224c66a9dbb9b03daefff200f4f8eaf45590fb722b6fe2924a201d6de2e8d/bin NetworkPluginConfDir:/var/lib/rancher/k3s/agent/etc/cni/net.d NetworkPluginConfTemplate:} Registry:{Mirrors:map[docker.io:{Endpoints:[https://registry-1.docker.io]}] Auths:map[]} StreamServerAddress:home-ubuntu StreamServerPort:10010 EnableSelinux:false SandboxImage:k8s.gcr.io/pause:3.1 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false} ContainerdRootDir:/var/lib/rancher/k3s/agent/containerd ContainerdEndpoint:/run/k3s/containerd/containerd.sock RootDir:/var/lib/rancher/k3s/agent/containerd/io.containerd.grpc.v1.cri StateDir:/run/k3s/containerd/io.containerd.grpc.v1.cri}"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.277992411Z" level=info msg="Connect containerd service"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.278197601Z" level=info msg="Get image filesystem path \"/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs\""
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.278553015Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.278756728Z" level=info msg=serving... address=/run/k3s/containerd/containerd.sock
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.278847346Z" level=info msg="containerd successfully booted in 0.024563s"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.283062933Z" level=info msg="Start subscribing containerd event"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.283188201Z" level=info msg="Start recovering state"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.283345246Z" level=info msg="Start event monitor"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.283429704Z" level=info msg="Start snapshots syncer"
May 14 20:30:53 home-ubuntu k3s[11731]: time="2019-05-14T20:30:53.283507970Z" level=info msg="Start streaming server"
May 14 20:30:54 home-ubuntu k3s[11731]: time="2019-05-14T20:30:54.225127364Z" level=info msg="Connecting to wss://192.168.1.30:6443/v1-k3s/connect"
May 14 20:30:54 home-ubuntu k3s[11731]: time="2019-05-14T20:30:54.225633260Z" level=info msg="Connecting to proxy" url="wss://192.168.1.30:6443/v1-k3s/connect"
May 14 20:30:54 home-ubuntu k3s[11731]: time="2019-05-14T20:30:54.240693604Z" level=info msg="Running kubelet --resolv-conf=/run/systemd/resolve/resolv.conf --container-runtime=remote --tls-cert-file=/var/lib/rancher/k3s/agent/token-node.crt --healthz-bind-address=127.0.0.1 --authentication-token-webhook=true --seccomp-profile-root=/var/lib/rancher/k3s/agent/kubelet/seccomp --cni-bin-dir=/var/lib/rancher/k3s/data/4e1224c66a9dbb9b03daefff200f4f8eaf45590fb722b6fe2924a201d6de2e8d/bin --cluster-dns=10.43.0.10 --read-only-port=0 --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --hostname-override=home-ubuntu --allow-privileged=true --cgroup-driver=cgroupfs --kubeconfig=/var/lib/rancher/k3s/agent/kubeconfig.yaml --address=0.0.0.0 --anonymous-auth=false --root-dir=/var/lib/rancher/k3s/agent/kubelet --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.pem --tls-private-key-file=/var/lib/rancher/k3s/agent/token-node.key --kubelet-cgroups=/systemd/system.slice --fail-swap-on=false --cluster-domain=cluster.local --authorization-mode=Webhook --serialize-image-pulls=false --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --cert-dir=/var/lib/rancher/k3s/agent/kubelet/pki --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --runtime-cgroups=/systemd/system.slice"
May 14 20:30:54 home-ubuntu k3s[11731]: Flag --allow-privileged has been deprecated, will be removed in a future version
May 14 20:30:54 home-ubuntu k3s[11731]: W0514 20:30:54.259589   11731 server.go:214] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.265390   11731 server.go:398] Version: v1.14.1-k3s.4
May 14 20:30:54 home-ubuntu k3s[11731]: time="2019-05-14T20:30:54.293791987Z" level=info msg="waiting for node home-ubuntu: nodes \"home-ubuntu\" not found"
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.322756   11731 server.go:591] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.323120   11731 container_manager_linux.go:261] container manager verified user specified cgroup-root exists: []

May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.323205   11731 container_manager_linux.go:266] Creating Container Manager object based on Node Config: {RuntimeCgroupsName:/systemd/system.slice SystemCgroupsName: KubeletCgroupsName:/systemd/system.slice ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/rancher/k3s/agent/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity: Percentage:0.05} GracePeriod:0s MinReclaim:} {Signal:nodefs.available Operator:LessThan Value:{Quantity: Percentage:0.05} GracePeriod:0s MinReclaim:}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms}
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.323345   11731 container_manager_linux.go:286] Creating device plugin manager: true
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.323463   11731 state_mem.go:36] [cpumanager] initializing new in-memory state store
May 14 20:30:54 home-ubuntu k3s[11731]: W0514 20:30:54.330725   11731 node.go:113] Failed to retrieve node info: nodes "home-ubuntu" not found
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.330741   11731 server_others.go:148] Using iptables Proxier.
May 14 20:30:54 home-ubuntu k3s[11731]: W0514 20:30:54.330798   11731 proxier.go:314] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.334052   11731 server.go:555] Version: v1.14.1-k3s.4
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.344672   11731 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.344733   11731 conntrack.go:52] Setting nf_conntrack_max to 131072
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.344792   11731 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.344815   11731 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.345128   11731 config.go:202] Starting service config controller
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.345142   11731 controller_utils.go:1027] Waiting for caches to sync for service config controller
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.345184   11731 config.go:102] Starting endpoints config controller
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.345204   11731 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.345659   11731 kubelet.go:299] Watching apiserver
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.369664   11731 kuberuntime_manager.go:210] Container runtime containerd initialized, version: 1.2.5+unknown, apiVersion: v1alpha2
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.383909   11731 server.go:987] Started kubelet
May 14 20:30:54 home-ubuntu systemd[1]: Started Lightweight Kubernetes.
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.387048   11731 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.387288   11731 status_manager.go:152] Starting to sync pod status with apiserver
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.387298   11731 kubelet.go:1774] Starting kubelet main sync loop.
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.387307   11731 kubelet.go:1791] skipping pod synchronization - [container runtime status check may not have completed yet., PLEG is not healthy: pleg has yet to be successful.]
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.387354   11731 server.go:137] Starting to listen on 0.0.0.0:10250
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.387747   11731 server.go:328] Adding debug handlers to kubelet server.
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.388350   11731 volume_manager.go:248] Starting Kubelet Volume Manager
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.391035   11731 desired_state_of_world_populator.go:130] Desired state populator starts to run
May 14 20:30:54 home-ubuntu k3s[11731]: E0514 20:30:54.406753   11731 cri_stats_provider.go:373] Failed to get the info of the filesystem with mountpoint "/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs": unable to find data in memory cache.
May 14 20:30:54 home-ubuntu k3s[11731]: E0514 20:30:54.406777   11731 kubelet.go:1250] Image garbage collection failed once. Stats initialization may not have completed yet: invalid capacity 0 on image filesystem
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.531837   11731 kubelet_node_status.go:281] Setting node annotation to enable volume controller attach/detach
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.547893   11731 cpu_manager.go:155] [cpumanager] starting with none policy
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.548002   11731 cpu_manager.go:156] [cpumanager] reconciling every 10s
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.548091   11731 policy_none.go:42] [cpumanager] none policy: Start
May 14 20:30:54 home-ubuntu k3s[11731]: E0514 20:30:54.549209   11731 controller.go:194] failed to get node "home-ubuntu" when trying to set owner ref to the node lease: nodes "home-ubuntu" not found
May 14 20:30:54 home-ubuntu k3s[11731]: W0514 20:30:54.565052   11731 manager.go:537] Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.585609   11731 kubelet_node_status.go:70] Attempting to register node home-ubuntu
May 14 20:30:54 home-ubuntu k3s[11731]: E0514 20:30:54.593410   11731 eviction_manager.go:247] eviction manager: failed to get summary stats: failed to get node info: node "home-ubuntu" not found
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.596512   11731 kubelet_node_status.go:73] Successfully registered node home-ubuntu
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.625293   11731 kuberuntime_manager.go:946] updating runtime config through cri with podcidr 10.42.1.0/24
May 14 20:30:54 home-ubuntu k3s[11731]: time="2019-05-14T20:30:54.625888861Z" level=info msg="No cni config template is specified, wait for other system components to drop the config."
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.626302   11731 kubelet_network.go:69] Setting Pod CIDR:  -> 10.42.1.0/24
May 14 20:30:54 home-ubuntu k3s[11731]: I0514 20:30:54.805829   11731 reconciler.go:154] Reconciler: start to sync state
May 14 20:30:56 home-ubuntu k3s[11731]: I0514 20:30:56.299737   11731 flannel.go:89] Determining IP address of default interface
May 14 20:30:56 home-ubuntu k3s[11731]: I0514 20:30:56.300471   11731 flannel.go:102] Using interface with name enp0s3 and address 192.168.1.31
May 14 20:30:56 home-ubuntu k3s[11731]: I0514 20:30:56.301525   11731 kube.go:127] Waiting 10m0s for node controller to sync
May 14 20:30:56 home-ubuntu k3s[11731]: I0514 20:30:56.301647   11731 kube.go:306] Starting kube subnet manager
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.302469   11731 kube.go:134] Node controller sync successful
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.303067   11731 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.317812   11731 flannel.go:75] Wrote subnet file to /run/flannel/subnet.env
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.317825   11731 flannel.go:79] Running backend.
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.317830   11731 vxlan_network.go:60] watching for new subnet leases
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.318481   11731 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.318488   11731 iptables.go:167] Deleting iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.318758   11731 iptables.go:167] Deleting iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.319044   11731 iptables.go:167] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.1.0/24 -j RETURN
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.319311   11731 iptables.go:167] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -j MASQUERADE --random-fully
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.319592   11731 iptables.go:155] Adding iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.321515   11731 iptables.go:155] Adding iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.322163   11731 iptables.go:155] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.1.0/24 -j RETURN
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.325100   11731 iptables.go:155] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -j MASQUERADE --random-fully
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.326550   11731 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.326560   11731 iptables.go:167] Deleting iptables rule: -s 10.42.0.0/16 -j ACCEPT
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.329360   11731 iptables.go:167] Deleting iptables rule: -d 10.42.0.0/16 -j ACCEPT
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.329703   11731 iptables.go:155] Adding iptables rule: -s 10.42.0.0/16 -j ACCEPT
May 14 20:30:57 home-ubuntu k3s[11731]: I0514 20:30:57.330390   11731 iptables.go:155] Adding iptables rule: -d 10.42.0.0/16 -j ACCEPT
May 14 20:31:48 home-ubuntu k3s[11731]: I0514 20:31:48.395415   11731 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-2f6rr" (UniqueName: "kubernetes.io/secret/4c202600-7687-11e9-aa0a-000af77c5868-default-token-2f6rr") pod "nginx-jn2mw" (UID: "4c202600-7687-11e9-aa0a-000af77c5868")
May 14 20:31:48 home-ubuntu k3s[11731]: W0514 20:31:48.512614   11731 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope: no such file or directory
May 14 20:31:48 home-ubuntu k3s[11731]: W0514 20:31:48.512657   11731 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope: no such file or directory
May 14 20:31:48 home-ubuntu k3s[11731]: W0514 20:31:48.512674   11731 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope: no such file or directory
May 14 20:31:48 home-ubuntu k3s[11731]: W0514 20:31:48.512685   11731 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/system.slice/run-re59e9cc84ea14690841a3e3d69bb4d4b.scope: no such file or directory
May 14 20:31:48 home-ubuntu k3s[11731]: time="2019-05-14T20:31:48.675197793Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:nginx-jn2mw,Uid:4c202600-7687-11e9-aa0a-000af77c5868,Namespace:default,Attempt:0,}"
May 14 20:31:49 home-ubuntu k3s[11731]: time="2019-05-14T20:31:49.239254167Z" level=debug msg="Wrote ping"
May 14 20:31:50 home-ubuntu k3s[11731]: time="2019-05-14T20:31:50.963468649Z" level=info msg="ImageCreate event &ImageCreate{Name:k8s.gcr.io/pause:3.1,Labels:map[string]string{},}"
May 14 20:31:51 home-ubuntu k3s[11731]: time="2019-05-14T20:31:51.007659199Z" level=info msg="ImageCreate event &ImageCreate{Name:sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e,Labels:map[string]string{io.cri-containerd.image: managed,},}"
May 14 20:31:51 home-ubuntu k3s[11731]: time="2019-05-14T20:31:51.008251852Z" level=info msg="ImageUpdate event &ImageUpdate{Name:k8s.gcr.io/pause:3.1,Labels:map[string]string{io.cri-containerd.image: managed,},}"
May 14 20:31:51 home-ubuntu k3s[11731]: time="2019-05-14T20:31:51.031802056Z" level=info msg="ImageUpdate event &ImageUpdate{Name:sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e,Labels:map[string]string{io.cri-containerd.image: managed,},}"
May 14 20:31:51 home-ubuntu k3s[11731]: time="2019-05-14T20:31:51.033982425Z" level=info msg="ImageUpdate event &ImageUpdate{Name:k8s.gcr.io/pause:3.1,Labels:map[string]string{io.cri-containerd.image: managed,},}"
May 14 20:31:51 home-ubuntu k3s[11731]: time="2019-05-14T20:31:51.041692134Z" level=info msg="ImageCreate event &ImageCreate{Name:k8s.gcr.io/pause@sha256:f78411e19d84a252e53bff71a4407a5686c46983a2c2eeed83929b888179acea,Labels:map[string]string{io.cri-containerd.image: managed,},}"
May 14 20:31:51 home-ubuntu k3s[11731]: time="2019-05-14T20:31:51.134885515Z" level=info msg="shim containerd-shim started" address=/containerd-shim/k8s.io/02697cc2519c0c5fb404b3a255822bc5a71ecb9da54bf6d68d0f6be17cde5242/shim.sock debug=false pid=12085

May 14 20:33:51 home-ubuntu k3s[11731]: E0514 20:33:51.506213   11731 remote_runtime.go:186] ListPodSandbox with filter nil from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:51 home-ubuntu k3s[11731]: E0514 20:33:51.506293   11731 kuberuntime_sandbox.go:210] ListPodSandbox failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:51 home-ubuntu k3s[11731]: E0514 20:33:51.506311   11731 generic.go:205] GenericPLEG: Unable to retrieve pods: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:52 home-ubuntu k3s[11731]: E0514 20:33:52.392974   11731 remote_runtime.go:186] ListPodSandbox with filter &PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:52 home-ubuntu k3s[11731]: E0514 20:33:52.393361   11731 kuberuntime_sandbox.go:210] ListPodSandbox failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:52 home-ubuntu k3s[11731]: E0514 20:33:52.393732   11731 kubelet_pods.go:1019] Error listing containers: &status.statusError{Code:4, Message:"context deadline exceeded", Details:[]*any.Any(nil)}
May 14 20:33:52 home-ubuntu k3s[11731]: E0514 20:33:52.393913   11731 kubelet.go:1942] Failed cleaning pods: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:54 home-ubuntu k3s[11731]: time="2019-05-14T20:33:54.239713492Z" level=debug msg="Wrote ping"
May 14 20:33:54 home-ubuntu k3s[11731]: E0514 20:33:54.468564   11731 remote_runtime.go:186] ListPodSandbox with filter nil from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:54 home-ubuntu k3s[11731]: E0514 20:33:54.469246   11731 kuberuntime_sandbox.go:210] ListPodSandbox failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:54 home-ubuntu k3s[11731]: E0514 20:33:54.469463   11731 kubelet.go:1222] Container garbage collection failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:54 home-ubuntu k3s[11731]: E0514 20:33:54.662278   11731 remote_runtime.go:186] ListPodSandbox with filter &PodSandboxFilter{Id:,State:nil,LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 14 20:33:54 home-ubuntu k3s[11731]: E0514 20:33:54.662629   11731 eviction_manager.go:247] eviction manager: failed to get summary stats: failed to list pod stats: failed to list all pod sandboxes: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Additional context Add any other context about the problem here.

runningman84 commented 5 years ago

I have the same issue. Funny enough the k3s server which also includes the agent works just fine.

Are you sure that it is caused by systemd?

vincentmli commented 5 years ago

I am not sure if it is directly related to systemd, maybe I can add debug level to kubelet and hopefully that could give more clue

vincentmli commented 5 years ago

I attached the k3s agent node kubelet with log level to 4, it appears kubelet fail to create sandbox for the nignx pod, there are some errors factory.go Factory "systemd", not sure if it is related here is the nginx pod:

root@Office-R220-vli:/home/vincent# kubectl get po -o wide
NAME          READY   STATUS              RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATES
nginx-kzbq6   0/1     ContainerCreating   0          24m          home-ubuntu                  
nginx-spws4   1/1     Running             0          24m   10.42.0.23   office-r220-vli              

search "nginx-kzbq6" in the debug log attached

k3s-agent.txt

systemd service to start k3s agent node

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target

[Service]
Type=notify
ExecStart=/usr/local/bin/k3s agent --kubelet-arg "v=4" -s https://192.168.1.30:6443 -t "K102fcec9c1bd7ecdeb16a471571d2fb3abe6d0a3d49cb69b029d0264ea78a71e3c::node:099ebc33d9626443a5290c3cc146602a"
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity

[Install]
WantedBy=multi-user.target
runningman84 commented 5 years ago

@ibuildthecloud is this issue fixed in 0.6.0-rc2?

runningman84 commented 5 years ago

version 0.6.0-rc3 suffer from the same problem:

May 29 21:45:08 cubi002 k3s[10211]: time="2019-05-29T21:45:08.124306382Z" level=debug msg="Wrote ping"
May 29 21:45:08 cubi002 k3s[10211]: time="2019-05-29T21:45:08.505468246Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
May 29 21:45:08 cubi002 k3s[10211]: time="2019-05-29T21:45:08.505845147Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
May 29 21:45:13 cubi002 k3s[10211]: time="2019-05-29T21:45:13.124379839Z" level=debug msg="Wrote ping"
May 29 21:45:13 cubi002 k3s[10211]: time="2019-05-29T21:45:13.268668978Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
May 29 21:45:13 cubi002 k3s[10211]: time="2019-05-29T21:45:13.269054802Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
May 29 21:45:18 cubi002 k3s[10211]: time="2019-05-29T21:45:18.124380620Z" level=debug msg="Wrote ping"
May 29 21:45:18 cubi002 k3s[10211]: E0529 21:45:18.237592   10211 remote_runtime.go:186] ListPodSandbox with filter nil from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 29 21:45:18 cubi002 k3s[10211]: E0529 21:45:18.238211   10211 kuberuntime_sandbox.go:210] ListPodSandbox failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 29 21:45:18 cubi002 k3s[10211]: E0529 21:45:18.238270   10211 kubelet.go:1222] Container garbage collection failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
May 29 21:45:18 cubi002 k3s[10211]: time="2019-05-29T21:45:18.515309026Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
May 29 21:45:18 cubi002 k3s[10211]: time="2019-05-29T21:45:18.515686364Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
galal-hussein commented 5 years ago

I am not sure I can reproduce the issue, steps:

Results:

I can see all pods in a running state normally, @vincentmli can you describe the exact steps for reproducing the problem

runningman84 commented 5 years ago

Are you also running Ubuntu 1804? Maybe it is caused by Ubuntu

vincentmli commented 5 years ago

I tried following k3s-agent services on centos 7, with v0.5.0, it appears to be working @galal-hussein maybe this is ubuntu 1804 thing?

[root@k3s-agent ~]# cat /etc/systemd/system/k3s-agent.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
After=network-online.target

[Service]
Type=exec
Environment="HTTP_PROXY=http://10.3.254.254:3128/"
Environment="HTTPS_PROXY=http://10.3.254.254:3128/"
EnvironmentFile=/etc/systemd/system/k3s-agent.service.env
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s agent --server https://10.3.72.189:6443  --token-file /usr/local/bin/node-token --flannel-iface ens224 --node-ip 10.169.72.98
KillMode=process
Delegate=yes
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always

[Install]
WantedBy=multi-user.target

[root@rancher-k3s home]# kubectl get po -o wide --all-namespaces

NAMESPACE     NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE          NOMINATED NODE   READINESS GATES
default       nginx-7rlzc                   1/1     Running   0          3m51s   10.42.0.205    rancher-k3s              
default       nginx-pjh5g                   1/1     Running   0          3m51s   10.42.4.4      k3s-agent                
kube-system   cc-cluster-574dc9565c-kmkhk   1/1     Running   0          24s     10.169.72.98   k3s-agent                
kube-system   coredns-695688789-sjt4j       1/1     Running   0          15d     10.42.0.196    rancher-k3s              
runningman84 commented 5 years ago

k3s 0.6.1 worker still does not work with ubuntu 18.04:

-- Logs begin at Sun 2019-05-05 10:48:06 UTC, end at Thu 2019-06-20 19:17:33 UTC. --
Jun 20 19:14:53 cubi002 k3s[9214]: time="2019-06-20T19:14:53.539078347Z" level=debug msg="Wrote ping"
Jun 20 19:14:58 cubi002 k3s[9214]: time="2019-06-20T19:14:58.538764045Z" level=debug msg="Wrote ping"
Jun 20 19:14:58 cubi002 k3s[9214]: time="2019-06-20T19:14:58.804203207Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:14:58 cubi002 k3s[9214]: time="2019-06-20T19:14:58.804569265Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:03 cubi002 packetbeat[2053]: 2019-06-20T19:15:03.371Z        INFO        [monitoring]        log/log.go:144        Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks
Jun 20 19:15:03 cubi002 k3s[9214]: time="2019-06-20T19:15:03.538706957Z" level=debug msg="Wrote ping"
Jun 20 19:15:08 cubi002 k3s[9214]: time="2019-06-20T19:15:08.539240217Z" level=debug msg="Wrote ping"
Jun 20 19:15:08 cubi002 k3s[9214]: time="2019-06-20T19:15:08.815561269Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:08 cubi002 k3s[9214]: time="2019-06-20T19:15:08.815943936Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:13 cubi002 k3s[9214]: time="2019-06-20T19:15:13.539210588Z" level=debug msg="Wrote ping"
Jun 20 19:15:18 cubi002 k3s[9214]: time="2019-06-20T19:15:18.538836997Z" level=debug msg="Wrote ping"
Jun 20 19:15:18 cubi002 k3s[9214]: time="2019-06-20T19:15:18.842455930Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:18 cubi002 k3s[9214]: time="2019-06-20T19:15:18.842834007Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:23 cubi002 k3s[9214]: time="2019-06-20T19:15:23.539298797Z" level=debug msg="Wrote ping"
Jun 20 19:15:28 cubi002 k3s[9214]: time="2019-06-20T19:15:28.539269539Z" level=debug msg="Wrote ping"
Jun 20 19:15:28 cubi002 k3s[9214]: time="2019-06-20T19:15:28.852027842Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:28 cubi002 k3s[9214]: time="2019-06-20T19:15:28.852409952Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:33 cubi002 packetbeat[2053]: 2019-06-20T19:15:33.371Z        INFO        [monitoring]        log/log.go:144        Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks
Jun 20 19:15:33 cubi002 k3s[9214]: time="2019-06-20T19:15:33.539227821Z" level=debug msg="Wrote ping"
Jun 20 19:15:38 cubi002 k3s[9214]: time="2019-06-20T19:15:38.539236051Z" level=debug msg="Wrote ping"
Jun 20 19:15:38 cubi002 k3s[9214]: time="2019-06-20T19:15:38.861383583Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:38 cubi002 k3s[9214]: time="2019-06-20T19:15:38.861630787Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:43 cubi002 k3s[9214]: time="2019-06-20T19:15:43.278527065Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:43 cubi002 k3s[9214]: time="2019-06-20T19:15:43.278666501Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:43 cubi002 k3s[9214]: time="2019-06-20T19:15:43.539058389Z" level=debug msg="Wrote ping"
Jun 20 19:15:48 cubi002 k3s[9214]: time="2019-06-20T19:15:48.539264128Z" level=debug msg="Wrote ping"
Jun 20 19:15:48 cubi002 k3s[9214]: time="2019-06-20T19:15:48.869908084Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:48 cubi002 k3s[9214]: time="2019-06-20T19:15:48.870236424Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:53 cubi002 k3s[9214]: time="2019-06-20T19:15:53.539193016Z" level=debug msg="Wrote ping"
Jun 20 19:15:58 cubi002 k3s[9214]: time="2019-06-20T19:15:58.539327531Z" level=debug msg="Wrote ping"
Jun 20 19:15:58 cubi002 k3s[9214]: E0620 19:15:58.607342    9214 remote_runtime.go:132] StopPodSandbox "4991a8540ecb38dd22d73c4772b0739d476985a795447000bfbefd80f5fe904d" from runtime service failed: rpc error: code = DeadlineEx
Jun 20 19:15:58 cubi002 k3s[9214]: E0620 19:15:58.607436    9214 kuberuntime_gc.go:169] Failed to stop sandbox "4991a8540ecb38dd22d73c4772b0739d476985a795447000bfbefd80f5fe904d" before removing: rpc error: code = DeadlineExceed
Jun 20 19:15:58 cubi002 k3s[9214]: time="2019-06-20T19:15:58.608288404Z" level=error msg="StopPodSandbox for \"4991a8540ecb38dd22d73c4772b0739d476985a795447000bfbefd80f5fe904d\" failed" error="failed to stop sandbox container \
Jun 20 19:15:58 cubi002 k3s[9214]: time="2019-06-20T19:15:58.608288378Z" level=info msg="StopPodSandbox for \"763199d95d098577188715b552436231818bca5291c5a81065d1ca10839a387e\""
Jun 20 19:15:58 cubi002 k3s[9214]: time="2019-06-20T19:15:58.878185278Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:58 cubi002 k3s[9214]: time="2019-06-20T19:15:58.878473980Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:15:59 cubi002 k3s[9214]: E0620 19:15:59.630212    9214 remote_runtime.go:186] ListPodSandbox with filter nil from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 20 19:15:59 cubi002 k3s[9214]: E0620 19:15:59.630411    9214 kuberuntime_sandbox.go:210] ListPodSandbox failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 20 19:15:59 cubi002 k3s[9214]: E0620 19:15:59.630477    9214 generic.go:205] GenericPLEG: Unable to retrieve pods: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 20 19:16:00 cubi002 k3s[9214]: E0620 19:16:00.619765    9214 remote_runtime.go:186] ListPodSandbox with filter &PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},} from
Jun 20 19:16:00 cubi002 k3s[9214]: E0620 19:16:00.619928    9214 kuberuntime_sandbox.go:210] ListPodSandbox failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 20 19:16:00 cubi002 k3s[9214]: E0620 19:16:00.619993    9214 kubelet_pods.go:1019] Error listing containers: &status.statusError{Code:4, Message:"context deadline exceeded", Details:[]*any.Any(nil)}
Jun 20 19:16:00 cubi002 k3s[9214]: E0620 19:16:00.620084    9214 kubelet.go:1942] Failed cleaning pods: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 20 19:16:00 cubi002 k3s[9214]: E0620 19:16:00.620084    9214 kubelet.go:1942] Failed cleaning pods: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 20 19:16:03 cubi002 packetbeat[2053]: 2019-06-20T19:16:03.371Z        INFO        [monitoring]        log/log.go:144        Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks
Jun 20 19:16:03 cubi002 k3s[9214]: time="2019-06-20T19:16:03.538701138Z" level=debug msg="Wrote ping"
Jun 20 19:16:08 cubi002 k3s[9214]: time="2019-06-20T19:16:08.538885836Z" level=debug msg="Wrote ping"
Jun 20 19:16:08 cubi002 k3s[9214]: E0620 19:16:08.837936    9214 remote_runtime.go:287] ListContainers with filter &ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rp
Jun 20 19:16:08 cubi002 k3s[9214]: E0620 19:16:08.838064    9214 container_log_manager.go:174] Failed to rotate container logs: failed to list containers: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 20 19:16:08 cubi002 k3s[9214]: E0620 19:16:08.880244    9214 remote_runtime.go:287] ListContainers with filter &ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rp
Jun 20 19:16:08 cubi002 k3s[9214]: E0620 19:16:08.880415    9214 eviction_manager.go:247] eviction manager: failed to get summary stats: failed to list pod stats: failed to list all containers: rpc error: code = DeadlineExceede
Jun 20 19:16:08 cubi002 k3s[9214]: time="2019-06-20T19:16:08.888992481Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:08 cubi002 k3s[9214]: time="2019-06-20T19:16:08.889273795Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:13 cubi002 k3s[9214]: time="2019-06-20T19:16:13.264587948Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:13 cubi002 k3s[9214]: time="2019-06-20T19:16:13.264972113Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:13 cubi002 k3s[9214]: time="2019-06-20T19:16:13.538931830Z" level=debug msg="Wrote ping"
Jun 20 19:16:18 cubi002 k3s[9214]: time="2019-06-20T19:16:18.539204550Z" level=debug msg="Wrote ping"
Jun 20 19:16:18 cubi002 k3s[9214]: time="2019-06-20T19:16:18.908940504Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:18 cubi002 k3s[9214]: time="2019-06-20T19:16:18.909152112Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:23 cubi002 k3s[9214]: time="2019-06-20T19:16:23.539301693Z" level=debug msg="Wrote ping"
Jun 20 19:16:28 cubi002 k3s[9214]: time="2019-06-20T19:16:28.538747195Z" level=debug msg="Wrote ping"
Jun 20 19:16:28 cubi002 k3s[9214]: time="2019-06-20T19:16:28.917985822Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:28 cubi002 k3s[9214]: time="2019-06-20T19:16:28.918377541Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:33 cubi002 packetbeat[2053]: 2019-06-20T19:16:33.371Z        INFO        [monitoring]        log/log.go:144        Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks
Jun 20 19:16:33 cubi002 k3s[9214]: time="2019-06-20T19:16:33.539195113Z" level=debug msg="Wrote ping"
Jun 20 19:16:38 cubi002 k3s[9214]: time="2019-06-20T19:16:38.539254965Z" level=debug msg="Wrote ping"
Jun 20 19:16:38 cubi002 k3s[9214]: time="2019-06-20T19:16:38.927483447Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:38 cubi002 k3s[9214]: time="2019-06-20T19:16:38.927865563Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:43 cubi002 k3s[9214]: time="2019-06-20T19:16:43.263789746Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:43 cubi002 k3s[9214]: time="2019-06-20T19:16:43.264100853Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:43 cubi002 k3s[9214]: time="2019-06-20T19:16:43.539205062Z" level=debug msg="Wrote ping"
Jun 20 19:16:47 cubi002 k3s[9214]: E0620 19:16:47.460167    9214 remote_runtime.go:287] ListContainers with filter &ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rp
Jun 20 19:16:47 cubi002 k3s[9214]: E0620 19:16:47.460840    9214 handler.go:314] HTTP InternalServerError serving /stats/summary: Internal Error: failed to list pod stats: failed to list all containers: rpc error: code = Deadli
Jun 20 19:16:48 cubi002 k3s[9214]: time="2019-06-20T19:16:48.539355119Z" level=debug msg="Wrote ping"
Jun 20 19:16:48 cubi002 k3s[9214]: time="2019-06-20T19:16:48.937767536Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:48 cubi002 k3s[9214]: time="2019-06-20T19:16:48.938157692Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:53 cubi002 k3s[9214]: time="2019-06-20T19:16:53.539188607Z" level=debug msg="Wrote ping"
Jun 20 19:16:58 cubi002 k3s[9214]: time="2019-06-20T19:16:58.539234781Z" level=debug msg="Wrote ping"
Jun 20 19:16:58 cubi002 k3s[9214]: time="2019-06-20T19:16:58.946340092Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1
Jun 20 19:16:58 cubi002 k3s[9214]: time="2019-06-20T19:16:58.946624154Z" level=debug msg="FIXME: Got an status-code for which error does not match any expected type!!!: -1" module=api status_code=-1

@galal-hussein is there anything (furhter logs or something like that) you need from us in order to fix these kind of issues with ubuntu?

Ephemera commented 5 years ago

I fixed with network device cleanup

sudo pkill containerd-shim
ip link show | grep veth | awk '{ print $2 }' | cut -d\@ -f1 | sudo xargs -I{} ip link delete {}
sudo ip link delete cni0
sudo ip link delete flannel.1
lightcode commented 5 years ago

I've the same problem with my setup. The master+agent node is able to run pods, the node started only in agent mode cannot.

Here is some logs from /var/lib/rancher/k3s/agent/containerd/containerd.log:

time="2019-07-11T15:28:53.631647734Z" level=error msg="failed to kill shim" error="context deadline exceeded: unknown"
time="2019-07-11T15:28:53.728004105Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:filebeat-rgz6l,Uid:a2c5ce86-a3e9-11e9-9c76-024d0b33c46b,Namespace:kube-system,Attempt:0,} failed, error" error="failed to start sandbox container: failed to create containerd task: context deadline exceeded: unknown"
time="2019-07-11T15:29:08.630744196Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:filebeat-rgz6l,Uid:a2c5ce86-a3e9-11e9-9c76-024d0b33c46b,Namespace:kube-system,Attempt:0,}"
time="2019-07-11T15:29:08.693095679Z" level=info msg="shim containerd-shim started" address=/containerd-shim/k8s.io/edca0aab6c82f1f149911388f1a2377ab57137ea39ea09dd898ebb080b79c1ec/shim.sock debug=false pid=14385

I've updated CentOS 7 to the latest release:

[root@kube-node-1 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core) 
[root@kube-node-1 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core) 
[root@kube-node-1 ~]# /usr/local/bin/k3s --version
k3s version v0.7.0-rc5 (dcf28d98)

I'm running the same version on both master and agent. I've installed the latest version of k3s.

agaffney commented 5 years ago

I'm seeing this same issue with k3s v0.6.1 on Xenial and Bionic. When I switch from using the example systemd unit file in the git repo to running the command via sudo in an interactive shell, the pods immediately spring to life and I get no "context deadline exceeded" errors.

agaffney commented 5 years ago

After looking over the install script, I realized that there's a small difference between the systemd unit files created by the installer script and the example unit file in the root of the github repo. For the server, the installer script uses Type=notify, but it uses Type=exec for the agent. When I make this change to the unit file for the agents, everything appears to work normally.

From reading the systemd manual, the only difference between these options is how/when systemd considers the service as started in order to trigger follow-up units. I'm guessing that systemd also performs some other actions that are important for the proper functioning of the agent processes, but it never does them because it's waiting to be notified by the agent process.

erikwilson commented 5 years ago

That makes sense, thanks for pointing this out @agaffney. We should modify the install script to change the type for agents, or maybe better add some code to send a systemd notification from the agent also.

agaffney commented 5 years ago

The install script already does the right thing. The problem is that not everybody uses the installer (I'm not a fan of curlpipes). It would probably be a good idea to update the example systemd unit file in the repo and docs to reflect this difference, but modifying the agent so that there is no difference is also a good idea.