agent: Fix resourece leak

keloyang commented 3 years ago

Every ExecProcess will create a struct for process and container, and kata-agent need call WaitProcess to remove them. Normally WaitProcess will be called by kata-shim(exitcode, err := shim.wait())，but kata-shim can't do that in some cases, it will make resource leak, e.g. mem and /dev/pts/N. So I think the cleanup for ExecProcess can't depend on kata-shim calling WaitProcess,it should depends on whether the process exits.

take the following case, we will see the /det/pts/N leak,

create a pod with kata-container, omited.

terminal 1, check pts, and exec to kata-container

[root@centos0 deployment]# kubectl get pod
NAME                     READY     STATUS    RESTARTS   AGE
pts-6794f84bd9-jzp5b   1/1       Running   0          1h
# check pts 
[root@centos0 deployment]#  kubectl exec -ti pts-6794f84bd9-jzp5b ls /dev/pts
0  ptmx
# exec to kata container and run sleep 30, before sleep 30 exited, kill kata-shim(with  -terminal) at terminal 2
[root@centos0 deployment]#  kubectl exec -ti pts-6794f84bd9-jzp5b sleep 30
# check pts again
[root@centos0 deployment]#  kubectl exec -ti pts-6794f84bd9-jzp5b ls /dev/pts
0  1  ptmx
...
# repeat kubectl exec -ti pts-6794f84bd9-jzp5b sleep 30 and kill kata-shim
[root@centos0 deployment]#  kubectl exec -ti pts-6794f84bd9-jzp5b ls /dev/pts
0  1  2  3  4  5  6  7  9  8  ptmx

terminal 2, kill kata-shim which is created by kubectl exec -ti pts-6794f84bd9-jzp5b sleep 30

[root@centos1 ~]# ps -eaf|grep kata-shim
root      572188  572134  0 18:56 ?        00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595 -exec-id 56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595
root      572333  572314  0 18:56 ?        00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f -exec-id 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f
root      823320  572314  0 20:32 pts/3    00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f -exec-id 014f96c3-67a6-422c-9709-33f717cc624f -terminal
[root@centos1 ~]# kill -9 823320

keloyang commented 3 years ago

ping @jon @liwei @jbryce @gnawux PTAL, thanks.

lifupan commented 3 years ago

Hi @keloyang

I think you'd better describe your use case by detail and let use know why you killed the shim and stop the container process reaping, then figure out how to solve your issue.

liubin commented 3 years ago

Agree with @lifupan .

I'm wondering whati are some cases, if these cases are critical errors, we should not let the containers continually run.

yyyeerbo commented 3 years ago

This deliberate-man-made resource leak is the same as don't wait process in userspace but blame the kernel for resource leak..

If the shim is dead, maybe we'd better kill the container/pod associated with the dead shim, instead of letting the pod continue running

keloyang commented 3 years ago

@lifupan @liubin @yyyeerbo Thanks.

This deliberate-man-made resource leak is the same as don't wait process in userspace but blame the kernel for resource leak.

If " don't wait process in userspace", it is program bug, the author need fix. If process want reap child, but due to some exception(e.g. it was killed), os init maybe reap the exited process'child.

if kata-shim exit before shim.wait or the connection to kata-proxy was closed before shim.wait, these all make WaitProcesss can't be called and lead kata-agent resourece leak, and It's not recoverable.

If the process of kata-shim is created by exec(e.g. kubectl exec -ti ...), kata container will not be affected by this shim no matter the shim is abnormal or dead. In the followning ， we can kill -9 823320 and the kata container will still run well, but the kata-agent will leak a fd abort /dev/pts/ptmx, The discussion for this pr is about exec.

[root@centos1 ~]# ps -eaf|grep kata-shim
root      572188  572134  0 18:56 ?        00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595 -exec-id 56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595
root      572333  572314  0 18:56 ?        00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f -exec-id 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f
root      823320  572314  0 20:32 pts/3    00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f -exec-id 014f96c3-67a6-422c-9709-33f717cc624f -terminal
[root@centos1 ~]# kill -9 823320

I may not have explained the case clearly above, ues the following case, kata-proxy can limit the concurrent number for the command of "kubectl exec -ti ... ", if concurrent number reached 3, kata-proxy will close the connection from kata-shim. This is an exception test,run the script(yamux.sh), we can't exec to the container some hours later and will see the container can't alloc pts. for exception test, it can make the connection of "kubectl exec" is closed, but it should not lead we can't exec to the container again.

# Suppose there is only one pod'name contain nginx
# run kubectl exec -ti `kubectl get pod|grep nginx|awk '{print $1}'` bash in another terminal, and the run yamux.sh
[root@centos0 scripts]# cat yamux.sh
#!/bin/bash

for i in {1..100000};
do
        echo "=========== ${i} ==========="
        kubectl exec -ti `kubectl get pod|grep nginx|awk '{print $1}'` sleep 10
        sleep 1
done

hack kata-proxy

From 3d097fc0fa3158c6ed3d159eff249d6319aa36a2 Mon Sep 17 00:00:00 2001
From: Shukui Yang <keloyangsk@gmail.com>
Date: Mon, 11 Jan 2021 16:44:54 +0800
Subject: [PATCH] Add proxy connection limit

Signed-off-by: Shukui Yang <keloyangsk@gmail.com>
---
 proxy.go | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/proxy.go b/proxy.go
index ab062a5..f2c87d5 100644
--- a/proxy.go
+++ b/proxy.go
@@ -89,7 +89,7 @@ func heartBeat(session *yamux.Session) {
        }
 }

-func serve(servConn io.ReadWriteCloser, proto, addr string, results chan error) (net.Listener, *yamux.Session, error) {
+func serve(servConn io.ReadWriteCloser, proto, addr string, results chan error, sy *shimYamux) (net.Listener, *yamux.Session, error) {
        sessionConfig := yamux.DefaultConfig()
        // Disable keepAlive since we don't know how much time a container can be paused
        sessionConfig.EnableKeepAlive = false
@@ -137,20 +137,23 @@ func serve(servConn io.ReadWriteCloser, proto, addr string, results chan error)
                                return
                        }

+                       sy.Add(1)
+
                        // add 2 for the two copy goroutines
                        wg.Add(2)
-                       go proxyConn(conn, stream, wg)
+                       go proxyConn(conn, stream, wg, sy)
                }
        }()

        return l, session, nil
 }

-func proxyConn(conn1 net.Conn, conn2 net.Conn, wg *sync.WaitGroup) {
+func proxyConn(conn1 net.Conn, conn2 net.Conn, wg *sync.WaitGroup, sy *shimYamux) {
        once := &sync.Once{}
        cleanup := func() {
                conn1.Close()
                conn2.Close()
+               sy.Done()
        }
        copyStream := func(dst io.Writer, src io.Reader) {
                _, err := io.Copy(dst, src)
@@ -164,6 +167,12 @@ func proxyConn(conn1 net.Conn, conn2 net.Conn, wg *sync.WaitGroup) {

        go copyStream(conn1, conn2)
        go copyStream(conn2, conn1)
+
+       if !sy.Validate() {
+               time.AfterFunc(time.second*5, func(){
+                       once.Do(cleanup)
+               })
+       }
 }

 func unixAddr(uri string) (string, error) {
@@ -450,8 +459,11 @@ func realMain() {
                }
        }()

+       sy := &shimYamux{
+               yamuxMax: 5,
+       }
        results := make(chan error)
-       l, s, err := serve(servConn, "unix", listenAddr, results)
+       l, s, err := serve(servConn, "unix", listenAddr, results, sy)
        if err != nil {
                logger().WithError(err).Fatal("failed to serve")
                os.Exit(1)
@@ -479,6 +491,38 @@ func realMain() {
        logger().Debug("shutting down")
 }

+type shimYamux struct {
+       yamuxNum int
+       yamuxMax int
+
+       sync.Mutex
+}
+
+func (s* shimYamux) Add (num int) {
+       s.Lock()
+       defer s.Unlock()
+
+       s.yamuxNum += num
+}
+
+func (s* shimYamux) Done () {
+       s.Lock()
+       defer s.Unlock()
+
+       s.yamuxNum--
+}
+
+func (s* shimYamux) Validate () bool {
+       s.Lock()
+       defer s.Unlock()
+
+       if s.yamuxNum > s.yamuxMax {
+               return false
+       }
+       return true
+}
+
+
 func main() {
        defer handlePanic()
        realMain()
-- 
2.19.0

fidencio commented 3 years ago

Folks, this PR has been stale and I'm closing it as we're walking towards the final 1.13.0 release of the kata-containers-1.x. From now on, all the development will happen as part of https://github.com/kata-containers/kata-containers/ and fixes should be provided to that repo.

Thanks a lot for your contribution, we're sorry that for some reason it got stale, and we hope to see you around for new adventures in the kata-containers-2.x land.

kata-containers / agent

agent: Fix resourece leak #880