We face gVisor runner failing all the time.
There are two reasons:
Wrong endpoint caused whole error, and container not being cleaned up properly
W0424 11:02:04.475265 259894 cleanupnode.go:99] [reset] Failed to remove containers: [failed to stop running pod I0424: output: I0424 11:01:52.572194 260177 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.579298 260177 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"I0424\": not found" podSandboxID="I0424"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"I0424\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"I0424\": not found"
: exit status 1, failed to stop running pod 11:01:52.417320: output: I0424 11:01:52.707750 260250 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.711341 260250 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"11:01:52.417320\": not found" podSandboxID="11:01:52.417320"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"11:01:52.417320\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"11:01:52.417320\": not found"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
: exit status 1, failed to stop running pod 260012: output: I0424 11:01:52.833727 260320 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.837811 260320 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"260012\": not found" podSandboxID="260012"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"260012\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"260012\": not found"
: exit status 1, failed to stop running pod util_unix.go:103]: output: I0424 11:01:52.942331 260371 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:52.946834 260371 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"util_unix.go:103]\": not found" podSandboxID="util_unix.go:103]"
time="2024-04-24T11:01:52Z" level=fatal msg="stopping the pod sandbox \"util_unix.go:103]\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"util_unix.go:103]\": not found"
: exit status 1, failed to stop running pod "Using: output: I0424 11:01:53.049928 260431 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.055111 260431 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"\\\"Using\": not found" podSandboxID="\"Using"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"\\\"Using\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"\\\"Using\": not found"
: exit status 1, failed to stop running pod this: output: I0424 11:01:53.188970 260495 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.[19](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:20)2391 260495 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"this\": not found" podSandboxID="this"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"this\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"this\": not found"
: exit status 1, failed to stop running pod endpoint: output: I0424 11:01:53.299874 260564 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.303466 260564 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint\": not found" podSandboxID="endpoint"
time="[20](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:21)24-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"endpoint\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint\": not found"
: exit status 1, failed to stop running pod is: output: I0424 11:01:53.405669 260629 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.410281 260629 remote_runtime.go:[22](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:23)2] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"is\": not found" podSandboxID="is"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"is\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"is\": not found"
: exit status 1, failed to stop running pod deprecated,: output: I0424 11:01:53.513228 260677 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E04[24](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:25) 11:01:53.516442 260677 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"deprecated,\": not found" podSandboxID="deprecated,"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"deprecated,\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"deprecated,\": not found"
: exit status 1, failed to stop running pod please: output: I0424 11:01:53.624314 260748 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.628372 260748 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"please\": not found" podSandboxID="please"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"please\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"please\": not found"
: exit status 1, failed to stop running pod consider: output: I0424 11:01:53.731128 260819 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.735064 260819 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"consider\": not found" podSandboxID="consider"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"consider\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"consider\": not found"
: exit status 1, failed to stop running pod using: output: I0424 11:01:53.832924 260872 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.836874 260872 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"using\": not found" podSandboxID="using"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"using\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"using\": not found"
: exit status 1, failed to stop running pod full: output: I0424 11:01:53.927486 260933 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:53.931793 260933 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"full\": not found" podSandboxID="full"
time="2024-04-24T11:01:53Z" level=fatal msg="stopping the pod sandbox \"full\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"full\": not found"
: exit status 1, failed to stop running pod URL: output: I0424 11:01:54.036985 261003 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.040244 261003 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL\": not found" podSandboxID="URL"
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"URL\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL\": not found"
: exit status 1, failed to stop running pod format": output: I0424 11:01:54.143087 261054 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.149392 261054 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"format\\\"\": not found" podSandboxID="format\""
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"format\\\"\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"format\\\"\": not found"
: exit status 1, failed to stop running pod endpoint="/etc/vhive-cri/vhive-cri.sock": output: I0424 11:01:54.2703[25](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:26) 261170 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.273350 261170 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint=\\\"/etc/vhive-cri/vhive-cri.sock\\\"\": not found" podSandboxID="endpoint=\"/etc/vhive-cri/vhive-cri.sock\""
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"endpoint=\\\"/etc/vhive-cri/vhive-cri.sock\\\"\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"endpoint=\\\"/etc/vhive-cri/vhive-cri.sock\\\"\": not found"
: exit status 1, failed to stop running pod URL="unix:///etc/vhive-cri/vhive-cri.sock": output: I0424 11:01:54.370869 [26](https://github.com/vhive-serverless/vHive/actions/runs/8815318800/job/24197027374#step:10:27)1262 util_unix.go:103] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/etc/vhive-cri/vhive-cri.sock" URL="unix:///etc/vhive-cri/vhive-cri.sock"
E0424 11:01:54.374940 261262 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL=\\\"unix:///etc/vhive-cri/vhive-cri.sock\\\"\": not found" podSandboxID="URL=\"unix:///etc/vhive-cri/vhive-cri.sock\""
time="2024-04-24T11:01:54Z" level=fatal msg="stopping the pod sandbox \"URL=\\\"unix:///etc/vhive-cri/vhive-cri.sock\\\"\": rpc error: code = NotFound desc = an error occurred when try to find sandbox \"URL=\\\"unix:///etc/vhive-cri/vhive-cri.sock\\\"\": not found"
: exit status 1]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
Also, the file system removal is not being properly working.
Cleaning /run/gvisor-containerd/gvisor-containerd.sock /run/gvisor-containerd/gvisor-containerd.sock.ttrpc /run/gvisor-containerd/io.containerd.runtime.v1.linux /run/gvisor-containerd/io.containerd.runtime.v2.task
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/16/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/15/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/14/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/13/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/12/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/11/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/10/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/9/rootfs': Device or resource busy
rm: cannot remove '/run/gvisor-containerd/io.containerd.runtime.v2.task/default/8/rootfs': Device or resource busy
Cleaning /var/lib/gvisor-containerd/containerd
I also added go action caching (but minor)
I add github runner to check the go.modfile and get the go version automatically. Except for build test.
We face gVisor runner failing all the time. There are two reasons:
Wrong endpoint caused whole error, and container not being cleaned up properly
Also, the file system removal is not being properly working.
I also added go action caching (but minor) I add github runner to check the
go.mod
file and get the go version automatically. Except for build test.