perfsonar / bwctl

A scheduling and policy framework for measurement tools
Apache License 2.0
16 stars 6 forks source link

bwctld -Z fails to start under OSX launchd #10

Closed arlake228 closed 3 years ago

arlake228 commented 9 years ago

From @arlake228 on March 14, 2015 12:47

Original issue 1043 created by arlake228 on 2015-01-09T13:57:01.000Z:

What steps will reproduce the problem?

Try to start bwctld under OSX launchd:

sudo launchctl load /Library/LaunchDaemons/homebrew.mxcl.bwctl.plist

(plist file is given below)

What is the expected output? What do you see instead?

Daemon fails to start with the following error logged:

Jan 9 13:44:21 brians-macbook-air.lan bwctld[31077] <Error>: FILE=bwctld.c, LINE=2558, setpgid(): Operation not permitted

launchd then retries every 10 seconds.

What version of the product are you using? On what operating system?

bwctl 1.5.2-10 compiled under OSX Mavericks using Homebrew, --with-iperf3

Please provide any additional information below.

The relevant code in bwctld.c is:

else{
    /*
     * Depending upon the shell that starts this -Z &quot;foreground&quot;
     * daemon, this process may or may not be the Process Group
     * leader... This will make sure. (Needed so HUP/TERM
     * catching can kill the whole process group with one
     * kill call.) setsid handles this when daemonizing.
     */
    mypid = getpid();
    if(setpgid(0,mypid) != 0){
        I2ErrLog(errhand,&quot;setpgid(): %M&quot;);
        exit(1);
    }
}

I don't know why setpgid is necessary here, but the reason for using the -Z flag is so that launchd can supervise its child process, and it's not being run from an interactive shell.

owampd has the same code and is therefore likely affected in the same way.

Here is the launchd plist file I am using:

----- /Library/LaunchDaemons/homebrew.mxcl.bwctl.plist ----- <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd&quot;&gt; <plist version="1.0"> <dict> <key>Label</key> <string>homebrew.mxcl.bwctl</string> <key>ProgramArguments</key> <array> <string>/usr/local/opt/bwctl/bin/bwctld</string> <string>-c</string> <string>/usr/local/etc</string> <string>-R</string> <string>/usr/local/var/run/bwctld</string> <string>-Z</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> <key>WorkingDirectory</key> <string>/usr/local/var/log/bwctld</string> <key>StandardErrorPath</key> <string>/usr/local/var/log/bwctld/output.log</string> <key>StandardOutPath</key> <string>/usr/local/var/log/bwctld/output.log</string> <key>HardResourceLimits</key> <dict> <key>NumberOfFiles</key> <integer>1024</integer> </dict> <key>SoftResourceLimits</key> <dict> <key>NumberOfFiles</key> <integer>1024</integer> </dict> </dict> </plist>

Copied from original issue: perfsonar/project#1041

arlake228 commented 9 years ago

Comment #1 originally posted by arlake228 on 2015-01-09T13:58:25.000Z:

And for completeness here's the bwctld.conf file. Note that launchd starts the daemon as root, and then I've told it to switch to 'nobody'.

----- /usr/local/etc/bwctld.conf ----- allow_unsync log_location peer_port 6001-6200 facility local5 test_port 5001-5900 iperf_port 5001-5300 nuttcp_port 5301-5600 owamp_port 5601-5900 user nobody group nobody

arlake228 commented 9 years ago

Comment #2 originally posted by arlake228 on 2015-01-09T14:45:41.000Z:

I'm not sure if setpgid is needed there, but bwctl does use it later to make sure it's killing off all the child processes for a tester. Is there something about how we're using setpgid that OS X doesn't like?

arlake228 commented 9 years ago

Comment #3 originally posted by arlake228 on 2015-01-09T15:56:50.000Z:

The manpage gives three possible scenarios for EPERM:

 [EPERM]            The process indicated by the pid argument is a session leader.

 [EPERM]            The effective user ID of the requested process is different from
                    that of the caller and the process is not a descendant of the
                    calling process.

 [EPERM]            The value of the pgid argument is valid, but does not match the
                    process ID of the process indicated by the pid argument and there
                    is no process with a process group ID that matches the value of
                    the pgid argument in the same session as the calling process.
arlake228 commented 9 years ago

Comment #4 originally posted by arlake228 on 2015-01-09T16:04:48.000Z:

(aside: that is taken from the OSX manpage, although the Linux manpage is identical)

Guess: could it be that the process started by launchd is already a session leader?

arlake228 commented 9 years ago

Comment #5 originally posted by arlake228 on 2015-01-09T16:18:47.000Z:

Yes I can confirm, added a patch to add some debug info:

diff --git a/bwctld/bwctld.c b/bwctld/bwctld.c index c155e31..1eb7b38 100644 --- a/bwctld/bwctld.c +++ b/bwctld/bwctld.c @@ -2556,6 +2556,7 @@ main(int argc, char *argv[]) mypid = getpid(); if(setpgid(0,mypid) != 0){ I2ErrLog(errhand,"setpgid(): %M");

Result:

Jan 9 16:13:27 brians-macbook-air.lan bwctld[42219] : FILE=bwctld.c, LINE=2558, setpgid(): Operation not permitted Jan 9 16:13:27 brians-macbook-air.lan bwctld[42219] : FILE=bwctld.c, LINE=2559, getpid()=42219, getpgid(0)=42219, getsid(0)=42219

The following fix seems to do the job:

diff --git a/bwctld/bwctld.c b/bwctld/bwctld.c index c155e31..3ec9f5f 100644 --- a/bwctld/bwctld.c +++ b/bwctld/bwctld.c @@ -2554,7 +2554,7 @@ main(int argc, char argv[]) * kill call.) setsid handles this when daemonizing. / mypid = getpid();

A similar fix is required for owampd/owampd.c line 1560

arlake228 commented 9 years ago

Comment #6 originally posted by arlake228 on 2015-01-09T16:25:27.000Z:

Aside: KillChildren uses killpg(mypid,signal);

So the requirement is that this process is its own process group leader. I'm certainly no expert here, but I guess that if it's the session leader then it's also the process group leader, and that's fine.

arlake228 commented 9 years ago

Comment #7 originally posted by arlake228 on 2015-01-27T15:08:25.000Z:

<empty>