Closed keloyang closed 3 years ago
ping @jon @liwei @jbryce @gnawux PTAL, thanks.
Hi @keloyang
I think you'd better describe your use case by detail and let use know why you killed the shim and stop the container process reaping, then figure out how to solve your issue.
Agree with @lifupan .
I'm wondering whati are some cases
, if these cases are critical errors, we should not let the containers continually run.
This deliberate-man-made resource leak is the same as don't wait process in userspace but blame the kernel for resource leak..
If the shim is dead, maybe we'd better kill the container/pod associated with the dead shim, instead of letting the pod continue running
@lifupan @liubin @yyyeerbo Thanks.
This deliberate-man-made resource leak is the same as don't wait process in userspace but blame the kernel for resource leak.
If " don't wait process in userspace", it is program bug, the author need fix. If process want reap child, but due to some exception(e.g. it was killed), os init maybe reap the exited process'child.
if kata-shim exit before shim.wait or the connection to kata-proxy was closed before shim.wait, these all make WaitProcesss can't be called and lead kata-agent resourece leak, and It's not recoverable.
If the process of kata-shim is created by exec(e.g. kubectl exec -ti ...), kata container will not be affected by this shim no matter the shim is abnormal or dead. In the followning , we can kill -9 823320 and the kata container will still run well, but the kata-agent will leak a fd abort /dev/pts/ptmx, The discussion for this pr is about exec.
[root@centos1 ~]# ps -eaf|grep kata-shim
root 572188 572134 0 18:56 ? 00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595 -exec-id 56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595
root 572333 572314 0 18:56 ? 00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f -exec-id 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f
root 823320 572314 0 20:32 pts/3 00:00:00 /kata/bin/kata-shim -agent unix:///run/vc/sbs/56b5e8192430090ba129a7a4823d646a46bac5684055a3e557e8c87efaf37595/proxy.sock -container 6e8a25168697148e7065a9253c2caa5626548f6ed02c877ddc9dc93caaa6b76f -exec-id 014f96c3-67a6-422c-9709-33f717cc624f -terminal
[root@centos1 ~]# kill -9 823320
I may not have explained the case clearly above, ues the following case, kata-proxy can limit the concurrent number for the command of "kubectl exec -ti ... ", if concurrent number reached 3, kata-proxy will close the connection from kata-shim. This is an exception test,run the script(yamux.sh), we can't exec to the container some hours later and will see the container can't alloc pts. for exception test, it can make the connection of "kubectl exec" is closed, but it should not lead we can't exec to the container again.
# Suppose there is only one pod'name contain nginx
# run kubectl exec -ti `kubectl get pod|grep nginx|awk '{print $1}'` bash in another terminal, and the run yamux.sh
[root@centos0 scripts]# cat yamux.sh
#!/bin/bash
for i in {1..100000};
do
echo "=========== ${i} ==========="
kubectl exec -ti `kubectl get pod|grep nginx|awk '{print $1}'` sleep 10
sleep 1
done
hack kata-proxy
From 3d097fc0fa3158c6ed3d159eff249d6319aa36a2 Mon Sep 17 00:00:00 2001
From: Shukui Yang <keloyangsk@gmail.com>
Date: Mon, 11 Jan 2021 16:44:54 +0800
Subject: [PATCH] Add proxy connection limit
Signed-off-by: Shukui Yang <keloyangsk@gmail.com>
---
proxy.go | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 48 insertions(+), 4 deletions(-)
diff --git a/proxy.go b/proxy.go
index ab062a5..f2c87d5 100644
--- a/proxy.go
+++ b/proxy.go
@@ -89,7 +89,7 @@ func heartBeat(session *yamux.Session) {
}
}
-func serve(servConn io.ReadWriteCloser, proto, addr string, results chan error) (net.Listener, *yamux.Session, error) {
+func serve(servConn io.ReadWriteCloser, proto, addr string, results chan error, sy *shimYamux) (net.Listener, *yamux.Session, error) {
sessionConfig := yamux.DefaultConfig()
// Disable keepAlive since we don't know how much time a container can be paused
sessionConfig.EnableKeepAlive = false
@@ -137,20 +137,23 @@ func serve(servConn io.ReadWriteCloser, proto, addr string, results chan error)
return
}
+ sy.Add(1)
+
// add 2 for the two copy goroutines
wg.Add(2)
- go proxyConn(conn, stream, wg)
+ go proxyConn(conn, stream, wg, sy)
}
}()
return l, session, nil
}
-func proxyConn(conn1 net.Conn, conn2 net.Conn, wg *sync.WaitGroup) {
+func proxyConn(conn1 net.Conn, conn2 net.Conn, wg *sync.WaitGroup, sy *shimYamux) {
once := &sync.Once{}
cleanup := func() {
conn1.Close()
conn2.Close()
+ sy.Done()
}
copyStream := func(dst io.Writer, src io.Reader) {
_, err := io.Copy(dst, src)
@@ -164,6 +167,12 @@ func proxyConn(conn1 net.Conn, conn2 net.Conn, wg *sync.WaitGroup) {
go copyStream(conn1, conn2)
go copyStream(conn2, conn1)
+
+ if !sy.Validate() {
+ time.AfterFunc(time.second*5, func(){
+ once.Do(cleanup)
+ })
+ }
}
func unixAddr(uri string) (string, error) {
@@ -450,8 +459,11 @@ func realMain() {
}
}()
+ sy := &shimYamux{
+ yamuxMax: 5,
+ }
results := make(chan error)
- l, s, err := serve(servConn, "unix", listenAddr, results)
+ l, s, err := serve(servConn, "unix", listenAddr, results, sy)
if err != nil {
logger().WithError(err).Fatal("failed to serve")
os.Exit(1)
@@ -479,6 +491,38 @@ func realMain() {
logger().Debug("shutting down")
}
+type shimYamux struct {
+ yamuxNum int
+ yamuxMax int
+
+ sync.Mutex
+}
+
+func (s* shimYamux) Add (num int) {
+ s.Lock()
+ defer s.Unlock()
+
+ s.yamuxNum += num
+}
+
+func (s* shimYamux) Done () {
+ s.Lock()
+ defer s.Unlock()
+
+ s.yamuxNum--
+}
+
+func (s* shimYamux) Validate () bool {
+ s.Lock()
+ defer s.Unlock()
+
+ if s.yamuxNum > s.yamuxMax {
+ return false
+ }
+ return true
+}
+
+
func main() {
defer handlePanic()
realMain()
--
2.19.0
Folks, this PR has been stale and I'm closing it as we're walking towards the final 1.13.0 release of the kata-containers-1.x. From now on, all the development will happen as part of https://github.com/kata-containers/kata-containers/ and fixes should be provided to that repo.
Thanks a lot for your contribution, we're sorry that for some reason it got stale, and we hope to see you around for new adventures in the kata-containers-2.x land.
Every ExecProcess will create a struct for process and container, and kata-agent need call WaitProcess to remove them. Normally WaitProcess will be called by kata-shim(exitcode, err := shim.wait()),but kata-shim can't do that in some cases, it will make resource leak, e.g. mem and /dev/pts/N. So I think the cleanup for ExecProcess can't depend on kata-shim calling WaitProcess,it should depends on whether the process exits.
take the following case, we will see the /det/pts/N leak,