vmware-archive / kubeless

Kubernetes Native Serverless Framework
https://kubeless.io
Apache License 2.0
6.86k stars 754 forks source link

Trigger pods fail on microk8s #1046

Closed mehdisadeghi closed 5 years ago

mehdisadeghi commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST?: This is bug on fresh Ubuntu 18.04 and microk8s single node cluster installation.

What happened: Functions do not run on schedule due to error.

What you expected to happen: Functions should run on schedule.

How to reproduce it (as minimally and precisely as possible):

snap install microk8s --classic
microk8s.kubectl config view --raw > ~/.kube/config
microk8s.kubectl create ns kubeless
export RELEASE=$(curl -s https://api.github.com/repos/kubeless/kubeless/releases/latest | grep tag_name | cut -d '"' -f 4)
microk8s.kubectl create -f https://github.com/kubeless/kubeless/releases/download/$RELEASE/kubeless-$RELEASE.yaml
echo "def hello(event, context): pass" > f1.py
kubeless function deploy f1 --from-file f1.py --handler f1.hello --runtime python3.7 --schedule '*/3 * * * *'

Function deploy and runs when called perfectly. The schedule fails however.

    NAMESPACE     NAME                                                  READY   STATUS              RESTARTS   AGE
default       pod/f1-d8684fdc6-67q9f                                1/1     Running             0          9m21s
default       pod/trigger-f1-1561121460-75srj                       0/1     StartError          0          8m51s
default       pod/trigger-f1-1561121460-8db7h                       0/1     StartError          0          7m52s

The error is the same for each trigger pod:

    ➜  kubeless-playbook git:(master) ✗ kubectl describe pod trigger-f1-1561121280-4rxq6                  
Name:               trigger-f1-1561121280-4rxq6
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               ubuntu-2gb-nbg1-1/195.201.121.180
Start Time:         Fri, 21 Jun 2019 14:50:35 +0200
Labels:             controller-uid=29988236-9423-11e9-9d3f-960000290a27
                    job-name=trigger-f1-1561121280
Annotations:        <none>
Status:             Failed
IP:                 10.1.1.11
Controlled By:      Job/trigger-f1-1561121280
Containers:
  trigger:
    Container ID:  containerd://4ba957cde7b4ac318bea2f4e5e5fa3e2f983244e735d789256466dad387cd801
    Image:         kubeless/unzip@sha256:f162c062973cca05459834de6ed14c039d45df8cdb76097f50b028a1621b3697
    Image ID:      docker.io/kubeless/unzip@sha256:f162c062973cca05459834de6ed14c039d45df8cdb76097f50b028a1621b3697
    Port:          <none>
    Host Port:     <none>
    Args:
      curl
      -Lv
       -H "event-id: MtO_OBDHj2ntXL8" -H "event-time: 2019-06-21 12:50:26.356815543 +0000 UTC" -H "event-type: application/json" -H "event-namespace: cronjobtrigger.kubeless.io"
      http://f1.default.svc.cluster.local:8080
    State:          Terminated
      Reason:       StartError
      Message:      failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"read init-p: connection reset by peer\"": unknown
      Exit Code:    128
      Started:      Thu, 01 Jan 1970 01:00:00 +0100
      Finished:     Fri, 21 Jun 2019 14:50:43 +0200
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1m
      memory:  4Mi
    Requests:
      cpu:        1m
      memory:     4Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-c2s6q (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-c2s6q:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-c2s6q
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age    From                        Message
  ----     ------     ----   ----                        -------
  Normal   Scheduled  2m53s  default-scheduler           Successfully assigned default/trigger-f1-1561121280-4rxq6 to ubuntu-2gb-nbg1-1
  Normal   Pulled     2m50s  kubelet, ubuntu-2gb-nbg1-1  Container image "kubeless/unzip@sha256:f162c062973cca05459834de6ed14c039d45df8cdb76097f50b028a1621b3697" already present on machine
  Normal   Created    2m49s  kubelet, ubuntu-2gb-nbg1-1  Created container trigger
  Warning  Failed     2m45s  kubelet, ubuntu-2gb-nbg1-1  Error: failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"read init-p: connection reset by peer\"": unknown

Anything else we need to know?:

Environment:

    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"archive", BuildDate:"2019-06-08T16:22:01Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
    Kubeless version: v1.0.3-9-g2c847c48
    hetzner
andresmgot commented 5 years ago

This issue may be related:

https://github.com/docker/for-linux/issues/597

Some people are reporting there issues when running docker in Ubuntu 18.04 but apparently that can be fixed after upgrading/restarting. Can you check if that helps you with your issue?

mehdisadeghi commented 5 years ago

@andresmgot Thanks for your reply.

Apaprently microk8s snap on Ubuntu 18.04 does not use docker and uses containerd directly. There is no docker daemon running on my host.

mehdisadeghi commented 5 years ago

This is what I get every few minutes in the system logs:

Jun 24 10:20:40 k1 microk8s.daemon-controller-manager[14622]: I0624 10:20:40.108274   14622 event.go:258] Event(v1.ObjectReference{Kind:"Job", Namespace:"default", Name:"trigger-f1-1561364280", UID:"e3a91fe1-ef0e-45dc-8317-5cdcc8c358ad", APIVersion:"batch/v1", ResourceVersion:"1050", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: trigger-f1-1561364280-vpnmh
Jun 24 10:20:40 k1 microk8s.daemon-controller-manager[14622]: E0624 10:20:40.120785   14622 job_controller.go:397] Error syncing job: failed pod(s) detected for job key "default/trigger-f1-1561364280"
Jun 24 10:20:40 k1 microk8s.daemon-kubelet[14529]: I0624 10:20:40.197006   14529 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-vmhxs" (UniqueName: "kubernetes.io/secret/65c2648d-885a-44aa-9aff-310537b8d391-default-token-vmhxs") pod "trigger-f1-1561364280-vpnmh" (UID: "65c2648d-885a-44aa-9aff-310537b8d391")
Jun 24 10:20:40 k1 systemd[1]: Started Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/65c2648d-885a-44aa-9aff-310537b8d391/volumes/kubernetes.io~secret/default-token-vmhxs.
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40.424585656+02:00" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:trigger-f1-1561364280-vpnmh,Uid:65c2648d-885a-44aa-9aff-310537b8d391,Namespace:default,Attempt:0,}"
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Jun 24 10:20:40 k1 systemd-udevd[21815]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 24 10:20:40 k1 systemd-udevd[21815]: Could not generate persistent MAC address for vethdaf3a14b: No such file or directory
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_UP): vethdaf3a14b: link is not ready
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethdaf3a14b: link becomes ready
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered blocking state
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered disabled state
Jun 24 10:20:40 k1 kernel: device vethdaf3a14b entered promiscuous mode
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered blocking state
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered forwarding state
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40.694784259+02:00" level=info msg="shim containerd-shim started" address="/containerd-shim/k8s.io/06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94/shim.sock" debug=true pid=21835
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40+02:00" level=debug msg="registering ttrpc server"
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40+02:00" level=debug msg="serving api on unix socket" socket="[inherited from parent]"
Jun 24 10:20:42 k1 sshd[21716]: Failed password for root from 112.85.42.171 port 46357 ssh2
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.782408207+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:trigger-f1-1561364280-vpnmh,Uid:65c2648d-885a-44aa-9aff-310537b8d391,Namespace:default,Attempt:0,} returns sandbox id "06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94""
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.793869260+02:00" level=info msg="CreateContainer within sandbox "06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94" for container &ContainerMetadata{Name:trigger,Attempt:0,}"
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.870397350+02:00" level=info msg="CreateContainer within sandbox "06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94" for &ContainerMetadata{Name:trigger,Attempt:0,} returns container id "54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b""
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.871662847+02:00" level=info msg="StartContainer for "54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b""
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.874014187+02:00" level=info msg="shim containerd-shim started" address="/containerd-shim/k8s.io/54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b/shim.sock" debug=true pid=21937
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42+02:00" level=debug msg="registering ttrpc server"
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42+02:00" level=debug msg="serving api on unix socket" socket="[inherited from parent]"
Jun 24 10:20:44 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:44.968381513+02:00" level=info msg="TaskOOM event &TaskOOM{ContainerID:06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94,}"
Jun 24 10:20:44 k1 kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-998
Jun 24 10:20:44 k1 kernel: runc:[2:INIT] cpuset=54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b mems_allowed=0
Jun 24 10:20:44 k1 kernel: CPU: 1 PID: 21967 Comm: runc:[2:INIT] Not tainted 4.15.0-50-generic #54-Ubuntu
Jun 24 10:20:44 k1 kernel: Hardware name: Hetzner vServer, BIOS 20171111 11/11/2017
Jun 24 10:20:44 k1 kernel: Call Trace:
Jun 24 10:20:44 k1 kernel:  dump_stack+0x63/0x8b
Jun 24 10:20:44 k1 kernel:  dump_header+0x71/0x285
Jun 24 10:20:44 k1 kernel:  oom_kill_process+0x220/0x440
Jun 24 10:20:44 k1 kernel:  out_of_memory+0x2d1/0x4f0
Jun 24 10:20:44 k1 kernel:  mem_cgroup_out_of_memory+0x4b/0x80
Jun 24 10:20:44 k1 kernel:  mem_cgroup_oom_synchronize+0x2e8/0x320
Jun 24 10:20:44 k1 kernel:  ? mem_cgroup_css_online+0x40/0x40
Jun 24 10:20:44 k1 kernel:  pagefault_out_of_memory+0x36/0x7b
Jun 24 10:20:44 k1 kernel:  mm_fault_error+0x90/0x180
Jun 24 10:20:44 k1 kernel:  __do_page_fault+0x4a5/0x4d0
Jun 24 10:20:44 k1 kernel:  do_page_fault+0x2e/0xe0
Jun 24 10:20:44 k1 kernel:  ? async_page_fault+0x2f/0x50
Jun 24 10:20:44 k1 kernel:  do_async_page_fault+0x51/0x80
Jun 24 10:20:44 k1 kernel:  async_page_fault+0x45/0x50
Jun 24 10:20:44 k1 kernel: RIP: 0033:0x560f61a7721f
Jun 24 10:20:44 k1 kernel: RSP: 002b:000000c420145f78 EFLAGS: 00010206
Jun 24 10:20:44 k1 kernel: RAX: 000000c420145ff8 RBX: 000000c4200ac988 RCX: 000000c420000180
Jun 24 10:20:44 k1 kernel: RDX: 0000000000000000 RSI: 0000560f61f38440 RDI: 000000c420146388
Jun 24 10:20:44 k1 kernel: RBP: 000000c4201469f0 R08: 0000000000000000 R09: 0000000000000000
Jun 24 10:20:44 k1 kernel: R10: 0000560f61c91eb6 R11: 0000000000000010 R12: 0000000000000000
Jun 24 10:20:44 k1 kernel: R13: 00000000000000f1 R14: 0000000000000011 R15: 0000000000000000
Jun 24 10:20:44 k1 kernel: Task in /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391/54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b killed as a result of limit of /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391      
Jun 24 10:20:44 k1 kernel: memory: usage 4096kB, limit 4096kB, failcnt 43
Jun 24 10:20:44 k1 kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Jun 24 10:20:44 k1 kernel: kmem: usage 844kB, limit 9007199254740988kB, failcnt 0
Jun 24 10:20:44 k1 kernel: Memory cgroup stats for /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 24 10:20:44 k1 kernel: Memory cgroup stats for /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391/06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94: cache:0KB rss:44KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:44KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 24 10:20:44 k1 kernel: Memory cgroup stats for /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391/54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b: cache:0KB rss:3208KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:3208KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 24 10:20:44 k1 kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jun 24 10:20:44 k1 kernel: [21863]     0 21863      256        1    32768        0          -998 pause
Jun 24 10:20:44 k1 kernel: [21967]     0 21967    15559     1732   126976        0          -998 runc:[2:INIT]
Jun 24 10:20:44 k1 kernel: Memory cgroup out of memory: Kill process 21967 (runc:[2:INIT]) score 696 or sacrifice child
Jun 24 10:20:44 k1 kernel: Killed process 21967 (runc:[2:INIT]) total-vm:62236kB, anon-rss:3032kB, file-rss:3896kB, shmem-rss:0kB
andresmgot commented 5 years ago

It seems that the container is being killed because an OOM issue (which is weird because that container is just doing a curl request).

https://stackoverflow.com/questions/55304976/oom-killer-docker

It's a long shot but this may be related to the fact that the container is running as root. To prove/discard that can you try to create a similar pod but specifying a securityContext and changing the running UID? Something like:

securityContext:
  runAsUser: 1001
mehdisadeghi commented 5 years ago

@andresmgot would you mind describing where should I add the yaml snippet? I'm totally new to k8s and kubeless!

Moreover, in a new vm, I installed everything with a normal user and finally I got the same error. btw, can you reproduce this in microk8s install? In an Ubuntu vm it'd b sudo snap install microk8s --classic.

andresmgot commented 5 years ago

I will try to reproduce it. In the meantime you can try Minikube that is a supported environment.

mehdisadeghi commented 5 years ago

@andresmgot thanks for giving it a try!

I have already tried Minikube but it does not work with emulated CPUs which the vm provider provides. Moreover it does not work with the tiny vms that I use (1core, 2gb ram).

andresmgot commented 5 years ago

Have you tried https://github.com/kubernetes-sigs/kind ? It's also a lightweight Kubernetes distribution and I have tested Kubeless in that platform before.

mehdisadeghi commented 5 years ago

I haven't tried that yet. In fact I can run Kubeless on top of Kubernetes with scheduling inside a VM with 2cores and 4gb of ram. Here is my Ansible-Playbook for that. What I am trying to do, is fitting everything inside a 1core,2gb instance.

andresmgot commented 5 years ago

I found the problem with this. I was able to reproduce the issue using the latest Minikube version. The problem was that the memory limit set is too low. This commit fixes it:

https://github.com/kubeless/cronjob-trigger/pull/9/commits/61e04eac028c3f51d7ce6c24604ca22a425c5426

It will be available in the next release.

mehdisadeghi commented 5 years ago

@andresmgot I'll try with the git version and report back. Thank you very much!

mehdisadeghi commented 5 years ago

@andresmgot I tried scheduling functions with the new v1.0.4 release and it worked perfectly. Thank you!

P.S for future vistors: I had to enable dns on microk8s first microk8s.enable dns.