Open ccamp46 opened 7 years ago
Above example was done on shc Version 3.9.3
Can you give me the shc (link would do) that works for you?
Hi @neurobin, The old version on a 32 bit machine that I have that does not zombie out is shc Version 3.8.6, Generic Script Compiler. Do you want a link to that shc binary.. is that what you are asking? I can not use that shc binary as the produced binary will not run on our 64 bit boxes.
@ccamp46 link to shc-3.8.6 that you used (or you can upload it if it's not available on net).
@ccamp46 I didn't mean binary.
@neurobin Oh, I see. Here is a link to the test.sh.x compiled from 3.8.6 on 32 bit that works:
@neurobin Were you able to reproduce?
I noticed this seems to work better without the -U option
I compiled using the following combinations, with the following results:
shc -r -U -f test.sh -o test
= stopped procsshc -U -r -f test.sh -o test
= stopped procsshc -r -f test.sh -o test
= no stopped procsshc -f test.sh -o test
= no stopped procsLooks like -U
is causing some issues. Haven't been able to reproduce the issue without using that flag.
When it does stop, it's stopping at getenv()
every time
shc -U -D -r -f test.sh -o test
#!/bin/sh
set -x
echo
echo $$
exit
[fmsadm@nightlyprod kmatheny]$ ./test && jobs -p
shll=main
argc=1
argv[0]=./test
argv[1]=<null>
getenv(xffffffe7d7c586cf)=<null>
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=exec './test' "$@"
argv[3]=./test
argv[4]=<null>
shll=main
argc=1
argv[0]=./test
argv[1]=<null>
[1]+ Stopped ./test
[fmsadm@nightlyprod kmatheny]$ !?killtest
jobs -p | xargs -I{} kill -SIGCONT {} # killtest
getenv(xffffffe7d7c586cf)=18446743969955415759 1
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=
argv[3]=./test
argv[4]=<null>
[fmsadm@nightlyprod kmatheny]$ ./test && jobs -p
shll=main
argc=1
argv[0]=./test
argv[1]=<null>
getenv(xffffffe7d805dd77)=<null>
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=exec './test' "$@"
argv[3]=./test
argv[4]=<null>
shll=main
argc=1
argv[0]=./test
argv[1]=<null>
getenv(xffffffe7d805dd77)=18446743969959632247 1
shll=/bin/sh
argc=4
argv[0]=./test
argv[1]=-c
argv[2]=
argv[3]=./test
argv[4]=<null>
+ echo
+ echo 24708
24708
+ exit
@castcontrolmatt what kernel version are you using by the way? We have found that upgrading the kernel seemed to have solved this issue.
Currently running CentoS release 6.9 with kernel 2.6.32-642.el6.x86_64 with the issue. It's a fresh install on a virtual box VM. I have updated to 2.6.32-696.1.1.el6.x86_64, restarted, reinstalled shc and having the same issue.
[root@localhost ~]# /usr/local/bin/shc -U -D -r -f test.sh -o test [root@localhost ~]# ./test shll=main argc=1 argv[0]=./test argv[1]=<null>
@castcontrolmatt You need at least 2.6.39-. @ccamp46 and I found that .32 still has issues.
I have a way to solve this zombie process. Because of the different kernel versions, ptrace mechanism is different, for centos6 system(the Linux version 2.6.32-504.16.2.el6.x86_64), when you use ptrace (PTRACE_ATTACH, pid, 0, 0);
if the result is 0, then PTRACE_ATTACH will sends SIGSTOP to this thread. So I added this line in the code, then the process continues and did not generate zombie process.
That line code is ptrace(PTRACE_SYSCALL, pid, 0, 0);
Here is ptrace's manual
http://man7.org/linux/man-pages/man2/ptrace.2.html
if (!mine && errno != EBUSY)", " mine = !ptrace(PTRACE_ATTACH, pid, 0, 0);", " if (mine) {", " kill(pid, SIGCONT);", " **ptrace(PTRACE_SYSCALL, pid, 0, 0);**", " } else {", " perror(argv0);", " kill(pid, SIGKILL);", " }",
@neurobin
Can you please check how it react with the new flag 'H'
I have a way to solve this zombie process. Because of the different kernel versions, ptrace mechanism is different, for centos6 system(the Linux version 2.6.32-504.16.2.el6.x86_64), when you use
ptrace (PTRACE_ATTACH, pid, 0, 0);
if the result is 0, then PTRACE_ATTACH will sends SIGSTOP to this thread. So I added this line in the code, then the process continues and did not generate zombie process. That line code isptrace(PTRACE_SYSCALL, pid, 0, 0);
Here is ptrace's manual http://man7.org/linux/man-pages/man2/ptrace.2.htmlif (!mine && errno != EBUSY)", " mine = !ptrace(PTRACE_ATTACH, pid, 0, 0);", " if (mine) {", " kill(pid, SIGCONT);", " **ptrace(PTRACE_SYSCALL, pid, 0, 0);**", " } else {", " perror(argv0);", " kill(pid, SIGKILL);", " }",
@neurobin
I find ptrace(PTRACE_SYSCALL, pid, 0, 0);
can cause some kernel wrong,so I change the code to solve the problem, I use ptrace(PTRACE_CONT, pid, 0, 0);
replace the ptrace(PTRACE_SYSCALL, pid, 0, 0);
Tried this with multiple versions of shc, compiling with and without nearly every option. This is on a redhat/oracle linux 6.2 machine.
` #
cat test.sh
!/bin/sh
echo echo $$ exit # #
shc -r -U -f test.sh -o test
for i in {1..20}; do echo $i;./test; done
1
[1]+ Stopped ./test 2
5067 3
5070 4
5073 5
5076 6
5079 7
5082 8
5085 9
5088 10
5091 11
5094 12
5097 13
5100 14
5103
15
5106
16
5109
17
5112
18
5115
19
5118
20
[2]+ Stopped ./test # #
pgrep -l test
5064 test 5066 test 5121 test 5122 test
ps auxwwf | grep 5064
root 5064 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test root 5198 0.0 0.0 6380 684 pts/0 S+ 14:30 0:00 _ grep 5064
ps auxwwf | grep test
root 5064 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test root 5066 0.0 0.0 0 0 pts/0 Z 14:29 0:00 | _ [test]
root 5121 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test
root 5122 0.0 0.0 0 0 pts/0 Z 14:29 0:00 | _ [test]
root 5201 0.0 0.0 6380 688 pts/0 S+ 14:30 0:00 _ grep test
kill -SIGCONT 5064
# 5064
[1]- Done ./test
ps auxwwf | grep test
root 5121 0.0 0.0 3924 372 pts/0 T 14:29 0:00 _ ./test root 5122 0.0 0.0 0 0 pts/0 Z 14:29 0:00 | _ [test]
root 5210 0.0 0.0 6380 688 pts/0 S+ 14:31 0:00 _ grep test`
Now, I have several scripts I have compiled and have been using for months, but they get very little use. Most of the time, they run fine, it's only a small random percentage do they get zombied. Has anyone else seen this issue and is there a fix? FYI the test.sh is only a quick script to be able to share and fill out this issue.