ncabatoff / script-exporter

Prometheus exporter to invoke scripts and parse their output as metrics.
MIT License
37 stars 20 forks source link

failed to start child: fork/exec #10

Closed microwl43 closed 5 years ago

microwl43 commented 5 years ago

I have a script in a particular directory that outputs metrics to stdout. I am getting a failed to start child error. $ ./script-exporter -script.path /etc/scripts -web.listen-address :9661 2019/04/17 02:34:44 error running script 'db2_metrics': failed to start child: fork/exec /etc/scripts/db2_metrics: exec format error 2019/04/17 02:34:54 error running script 'db2_metrics': failed to start child: fork/exec /etc/scripts/db2_metrics: exec format error 2019/04/17 02:35:04 error running script 'db2_metrics': failed to start child: fork/exec /etc/scripts/db2_metrics: exec format error

$ ls -la /etc/scripts/ total 20 drwx------ 2 root root 4096 Apr 17 02:49 . drwxr-xr-x. 102 root root 12288 Apr 16 06:48 .. -rwx------ 1 db2inst1 db2iadm1 499 Apr 17 02:45 db2_metrics

$ /etc/scripts/db2_metrics db2_connect_status{dbname=DBNAME,groupname=db2} 1 db2_hadr_status{dbname=DBNAME,groupname=db2} 0

Thanks

ncabatoff commented 5 years ago

I can't think of how script-exporter would get that error but it would run by other means, though I suspect it's something to do with your shell. What does file /etc/scripts/db2_metrics report? What are the contents of db2_metrics? Can you paste it (eliding passwords and other sensitive information)?

microwl43 commented 5 years ago

$ cat /etc/scripts/db2_metrics

!/bin/bash

su - db2inst1 -c "db2 connect to DBNAME 1> /dev/null" if [ $? -ne 0 ] then echo "db2_connect_status{dbname="DBNAME",groupname="db2"} 0" else echo "db2_connect_status{dbname="DBNAME",groupname="db2"} 1" fi

HADRinactive="HADR is not active." cmdOutput=$(su - db2inst1 -c "db2pd -db DBNAME -hadr | grep \"HADR is\"")

if [ "$HADRinactive" == "$cmdOutput" ] then echo "db2_hadr_status{dbname="DBNAME",groupname="db2"} 0" else echo "db2_hadr_status{dbname="DBNAME",groupname="db2"} 1" fi

ncabatoff commented 5 years ago

What do these commands output?

file  /etc/scripts/db2_metrics
file /bin/bash
uname -a
microwl43 commented 5 years ago

$ file /etc/scripts/db2_metrics /etc/scripts/db2_metrics: Bourne-Again shell script text executable $ file /bin/bash /bin/bash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped $ uname -a Linux 2.6.32-696.23.1.el6.x86_64 #1 SMP Sat Feb 10 11:10:31 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

ncabatoff commented 5 years ago

How mysterious. strace will almost certainly provide the answer, but it produces a lot of output. Let's start with a very constrained filter, and if need be we'll broaden it.

Please post the output of:

strace -f -e execve -s 500 bash /etc/scripts/db2_metrics

and

strace -f -e execve -s 500 ./script-exporter -script.path /etc/scripts -web.listen-address :9661

where in the latter case you arrange to have script-exporter scraped so that it invokes db2_metrics.

You will likely have to apt/yum install strace first.

microwl43 commented 5 years ago

Thanks, I have attached 2 txt files with the outputs strace -f -e execve -s 500 bash /etc/scripts/db2_metrics > strace_1.txt strace -f -e execve -s 500 ./script-exporter -script.path /etc/scripts -web.listen-address :9661 > strace_2.txt

strace_2.txt strace_1.txt

ncabatoff commented 5 years ago

That strace output suggests that the script is executing just fine. I notice in your earlier ls output that the timestamp of the file is later than the timestamps in the error output. Is it possible you fixed something subsequent to those errors? If you plot rate(script_errors_total[1m]), is it nonzero for script_name="db2_metrics"?

microwl43 commented 5 years ago

Hi, thank you for your help. This is now resolved.

The issue as expected was in the script. If we look at comment 1 : https://github.com/ncabatoff/script-exporter/issues/10#issue-434143363 the output of the script shows this output. db2_connect_status{dbname=DBNAME,groupname=db2} 1 db2_hadr_status{dbname=DBNAME,groupname=db2} 0

The above DBNAME is not enclosed in "". I fixed the script so the output looks like : db2_connect_status{dbname="DBNAME",groupname="db2"} 1 db2_hadr_status{dbname="DBNAME",groupname="db2"} 0

ncabatoff commented 5 years ago

Great, glad you got it working!

I'm pretty sure the error "failed to start child: fork/exec /etc/scripts/db2_metrics: exec format error" was not related to this. You can verify this after the fact. The metric script_parse_errors_total should've been nonzero while you had the missing quotes. The metric script_errors_total should have been nonzero when the script wasn't executable at all.

raghutech commented 5 years ago

I am also getting the above exception intermittently when I run the docker image from my home network.

What might be the issue ?

raghutech commented 5 years ago

Capturing the errors

error running script 'run': failed to start child: fork/exec /opt/script-exporter/scripts/run: exec format error