neurobin / shc

Shell script compiler
https://neurobin.org/projects/softwares/unix/shc/
GNU General Public License v3.0
2.02k stars 345 forks source link

zombie proc (<defunct>) #26

Open ccamp46 opened 7 years ago

ccamp46 commented 7 years ago

Tried this with multiple versions of shc, compiling with and without nearly every option. This is on a redhat/oracle linux 6.2 machine.

` #

cat test.sh

!/bin/sh

echo echo $$ exit # #

shc -r -U -f test.sh -o test

for i in {1..20}; do echo $i;./test; done

1

[1]+ Stopped ./test 2

5067 3

5070 4

5073 5

5076 6

5079 7

5082 8

5085 9

5088 10

5091 11

5094 12

5097 13

5100 14

5103
15

5106
16

5109
17

5112
18

5115
19

5118
20

[2]+ Stopped ./test # #

pgrep -l test

5064 test 5066 test 5121 test 5122 test

ps auxwwf | grep 5064

root 5064 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test root 5198 0.0 0.0 6380 684 pts/0 S+ 14:30 0:00 _ grep 5064

ps auxwwf | grep test

root 5064 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test root 5066 0.0 0.0 0 0 pts/0 Z 14:29 0:00 | _ [test] root 5121 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test root 5122 0.0 0.0 0 0 pts/0 Z 14:29 0:00 | _ [test] root 5201 0.0 0.0 6380 688 pts/0 S+ 14:30 0:00 _ grep test

kill -SIGCONT 5064

# 5064

[1]- Done ./test

ps auxwwf | grep test

root 5121 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test root 5122 0.0 0.0 0 0 pts/0 Z 14:29 0:00 | _ [test] root 5210 0.0 0.0 6380 688 pts/0 S+ 14:31 0:00 _ grep test`

Now, I have several scripts I have compiled and have been using for months, but they get very little use. Most of the time, they run fine, it's only a small random percentage do they get zombied. Has anyone else seen this issue and is there a fix? FYI the test.sh is only a quick script to be able to share and fill out this issue.

ccamp46 commented 7 years ago

Above example was done on shc Version 3.9.3

neurobin commented 7 years ago

Can you give me the shc (link would do) that works for you?

ccamp46 commented 7 years ago

Hi @neurobin, The old version on a 32 bit machine that I have that does not zombie out is shc Version 3.8.6, Generic Script Compiler. Do you want a link to that shc binary.. is that what you are asking? I can not use that shc binary as the produced binary will not run on our 64 bit boxes.

neurobin commented 7 years ago

@ccamp46 link to shc-3.8.6 that you used (or you can upload it if it's not available on net).

ccamp46 commented 7 years ago

@neurobin http://s000.tinyupload.com/index.php?file_id=24710340506294627336

neurobin commented 7 years ago

@ccamp46 I didn't mean binary.

ccamp46 commented 7 years ago

@neurobin Oh, I see. Here is a link to the test.sh.x compiled from 3.8.6 on 32 bit that works:

ccamp46 commented 7 years ago

https://sandc.box.com/s/nn9lwv5ihdw7ad3qqx61ubwvftaqftp5

ccamp46 commented 7 years ago

@neurobin Were you able to reproduce?

TheRealMattLear commented 7 years ago

I noticed this seems to work better without the -U option

ph-One commented 7 years ago

I compiled using the following combinations, with the following results:

Looks like -U is causing some issues. Haven't been able to reproduce the issue without using that flag.

ph-One commented 7 years ago

When it does stop, it's stopping at getenv() every time

 shc -U -D -r -f test.sh -o test
#!/bin/sh
set -x

echo
echo $$
exit
[fmsadm@nightlyprod kmatheny]$ ./test && jobs -p
shll=main
argc=1
argv[0]=./test
argv[1]=<null>
getenv(xffffffe7d7c586cf)=<null>
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=exec './test' "$@"
argv[3]=./test
argv[4]=<null>
shll=main
argc=1
argv[0]=./test
argv[1]=<null>

[1]+  Stopped                 ./test

[fmsadm@nightlyprod kmatheny]$ !?killtest
jobs -p | xargs -I{} kill -SIGCONT {} # killtest
getenv(xffffffe7d7c586cf)=18446743969955415759 1
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=
argv[3]=./test
argv[4]=<null>
[fmsadm@nightlyprod kmatheny]$ ./test && jobs -p
shll=main
argc=1
argv[0]=./test
argv[1]=<null>
getenv(xffffffe7d805dd77)=<null>
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=exec './test' "$@"
argv[3]=./test
argv[4]=<null>
shll=main
argc=1
argv[0]=./test
argv[1]=<null>
getenv(xffffffe7d805dd77)=18446743969959632247 1
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=
argv[3]=./test
argv[4]=<null>
+ echo

+ echo 24708
24708
+ exit
ccamp46 commented 7 years ago

@castcontrolmatt what kernel version are you using by the way? We have found that upgrading the kernel seemed to have solved this issue.

TheRealMattLear commented 7 years ago

Currently running CentoS release 6.9 with kernel 2.6.32-642.el6.x86_64 with the issue. It's a fresh install on a virtual box VM. I have updated to 2.6.32-696.1.1.el6.x86_64, restarted, reinstalled shc and having the same issue. [root@localhost ~]# /usr/local/bin/shc -U -D -r -f test.sh -o test [root@localhost ~]# ./test shll=main argc=1 argv[0]=./test argv[1]=<null>

ph-One commented 7 years ago

@castcontrolmatt You need at least 2.6.39-. @ccamp46 and I found that .32 still has issues.

thleen commented 6 years ago

I have a way to solve this zombie process. Because of the different kernel versions, ptrace mechanism is different, for centos6 system(the Linux version 2.6.32-504.16.2.el6.x86_64), when you use ptrace (PTRACE_ATTACH, pid, 0, 0); if the result is 0, then PTRACE_ATTACH will sends SIGSTOP to this thread. So I added this line in the code, then the process continues and did not generate zombie process. That line code is ptrace(PTRACE_SYSCALL, pid, 0, 0); Here is ptrace's manual http://man7.org/linux/man-pages/man2/ptrace.2.html if (!mine && errno != EBUSY)", " mine = !ptrace(PTRACE_ATTACH, pid, 0, 0);", " if (mine) {", " kill(pid, SIGCONT);", " **ptrace(PTRACE_SYSCALL, pid, 0, 0);**", " } else {", " perror(argv0);", " kill(pid, SIGKILL);", " }", @neurobin

intika commented 5 years ago

Can you please check how it react with the new flag 'H'

thleen commented 5 years ago

I have a way to solve this zombie process. Because of the different kernel versions, ptrace mechanism is different, for centos6 system(the Linux version 2.6.32-504.16.2.el6.x86_64), when you use ptrace (PTRACE_ATTACH, pid, 0, 0); if the result is 0, then PTRACE_ATTACH will sends SIGSTOP to this thread. So I added this line in the code, then the process continues and did not generate zombie process. That line code is ptrace(PTRACE_SYSCALL, pid, 0, 0); Here is ptrace's manual http://man7.org/linux/man-pages/man2/ptrace.2.html if (!mine && errno != EBUSY)", " mine = !ptrace(PTRACE_ATTACH, pid, 0, 0);", " if (mine) {", " kill(pid, SIGCONT);", " **ptrace(PTRACE_SYSCALL, pid, 0, 0);**", " } else {", " perror(argv0);", " kill(pid, SIGKILL);", " }", @neurobin

I find ptrace(PTRACE_SYSCALL, pid, 0, 0); can cause some kernel wrong,so I change the code to solve the problem, I use ptrace(PTRACE_CONT, pid, 0, 0); replace the ptrace(PTRACE_SYSCALL, pid, 0, 0);