saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.15k stars 5.48k forks source link

Master doesn't stop when Minion's shell has a "nohup" function #14420

Closed binlee1990 closed 10 years ago

binlee1990 commented 10 years ago

when I execute salt command, "salt 'sn*' state.sls flume.startup -t 10", it didn't return result until I stop shell in the minion.

startup.sls

Start Flume:
  cmd.run:
    - cwd: /opt/flume/bin
    - name: ./start_agent.sh
    - user: root
    - group: root
    - stateful: False

start_agent.sh

#!/bin/bash
dir=`dirname $0`
cd $dir

printf $(date +%F)" "$(date +%H:%M:%S.%N)" Try to start flume server.\n" |tee -a ../logs/start_agent.log
nohup ./flume-ng agent --conf ../conf --conf-file ../conf/custem.conf --name agent -Dflume.monitoring.type=com.suning.flume.monitor.AgentCenterServer &

printf $(date +%F)" "$(date +%H:%M:%S.%N)" Now, start add new crontab.\n" |tee -a ../logs/start_agent.log
./flume-ng-crontab add | tee -a ../logs/start_agent.log

ps -ef --sort=-start_time |grep org.apache.flume.node.Application |grep -v grep| awk '{print $2}' | head -1 > ../logs/agent.pid &

printf $(date +%F)" "$(date +%H:%M:%S.%N)" Flume server has already started. Please check out the flume log.\n" |tee -a ../logs/start_agent.log

Is there any problem that the shell has a "nohup" function?

when I stop the shell in the minion, master got the result:

sncddevweb02:
----------
    State: - cmd
    Name:      ./start_agent.sh
    Function:  run
        Result:    True
        Comment:   Command "./start_agent.sh" run
        Changes:   pid: 3779
                   retcode: 0
                   stderr: Info: Sourcing environment configuration script /opt/flume/conf/flume-env.sh
+ exec ../java/jre/bin/java -Xmx128m -Dflume.monitoring.postHost=http://10.19.250.191:9090/monitor-admin -Dflume.monitoring.urlService=/logAgent/flume/setMonitorData -Dflume.monitoring.pollFrequency=60 -Dflume.monitoring.type=com.suning.flume.monitor.AgentCenterServer -cp '/opt/flume/conf:/opt/flume/lib/*:/opt/flume/plugins.d/suning-filechannel/lib/*:/opt/flume/plugins.d/suning-monitor/lib/*:/opt/flume/plugins.d/suning-sink/lib/*:/opt/flume/plugins.d/suning-source/lib/*:/opt/flume/plugins.d/suning-monitor/libext/*:/opt/flume/plugins.d/suning-sink/libext/*' -Djava.library.path= org.apache.flume.node.Application --conf-file ../conf/custem.conf --name agent
                   stdout: 2014-07-23 10:53:40.173415716 Try to start flume server.
2014-07-23 10:53:40.179870304 Now, start add new crontab.
2014-07-23 10:53:40.192697997 Add flume crontab complete!
2014-07-23 10:53:40.205635045 Flume server has already started. Please check out the flume log.

Summary
------------
Succeeded: 1
Failed:    0
------------
Total:     1
basepi commented 10 years ago

So the script doesn't exit properly by itself? What do you mean by "stop the shell in the minion", what did you do to stop the shell?

We have had some issues with backgrounding processes in Salt, and I think they're related to the subprocess module in Python more than anything that Salt does. Would you mind trying to run the shell script via subprocess in a Python shell and see if it exits properly there?

basepi commented 10 years ago

Also, what version of salt are you using?

binlee1990 commented 10 years ago

@basepi The version of salt is 0.17.2, I'm trying to upgrade it to the latest version. "stop the shell in the minion " means I kill -9 the process the shell produces.

I think it's the shell which produces a process that it's outputting message all the time, and salt can't recognize if the process has stopped.

My solution is add a redirect to a file in the shell

nohup ./flume-ng agent --conf ../conf --conf-file ../conf/custem.conf --name agent -Dflume.monitoring.type=com.suning.flume.monitor.AgentCenterServer >> ../logs/start_agent.log 2>&1 &

After that, salt can get the executed result without killing the process in the minion.

basepi commented 10 years ago

Yep, I think that's an issue with the Python subprocess module, not anything salt can deal with. I've marked this as an upstream bug, and am going to go ahead and close it, since you've found a good workaround. Thanks again!