Closed zerthimon closed 7 years ago
@zerthimon I'm not able to replicate this. Here is my test case:
root@vagrant-ubuntu-trusty-64:~# cat /srv/salt/test.sls
mysql_server_packages:
pkg.installed:
- names:
- mysql-server
- python-mysqldb
Run salt-call:
salt-call --local state.sls test -ldebug
The packages are installed without delay:
Summary for local
------------
Succeeded: 2 (changed=2)
Failed: 0
------------
Total states run: 2
root@vagrant-ubuntu-trusty-64:~# salt --versions-report
Salt Version:
Salt: 2015.8.10
Dependency Versions:
Jinja2: 2.7.2
M2Crypto: 0.21.1
Mako: Not Installed
PyYAML: 3.10
PyZMQ: 14.0.1
Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
RAET: Not Installed
Tornado: 4.3
ZMQ: 4.0.4
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libgit2: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.3.0
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed
System Versions:
dist: Ubuntu 14.04 trusty
machine: x86_64
release: 3.13.0-65-generic
system: Ubuntu 14.04 trusty
Is there something in my test case above that is wrong that will help me to replicate this issue?
Now if the watched config file changes, the service restars and the salt-call doesn't finish. Waiting on:
INFO ] Running state [mysql] at time 12:45:20.220713
[INFO ] Executing state service.mod_watch for mysql
[INFO ] Executing command ['service', 'mysql', 'status'] in directory '/root'
[INFO ] Executing command ['service', 'mysql', 'restart'] in directory '/root'
The service is restarted, the init script is a zombie, salt-call never finishes:
4 S root 5450 2288 0 80 0 - 162966 pipe_w 12:45 pts/0 00:00:01 /usr/bin/python /usr/bin/salt-call state.highstate
0 Z root 5601 5450 0 80 0 - 0 exit 12:45 pts/0 00:00:00 [mysql] <defunct>
0 S mysql 5683 1 0 80 0 - 1112 wait 12:45 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe
0 S mysql 6477 5683 0 80 0 - 1004890 poll_s 12:45 ? 00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=/storage/mysql --plugin-dir=/usr/lib/mysql/plugin --log-error=/var/log/mysql/mysqld.log --open-files-limit=10000 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306
salt-call process straced:
root@db02:~# strace -p 5450
Process 5450 attached
read(57,
salt-call process lsof:
salt-call 5450 root 57r FIFO 0,9 0t0 22461 pipe
Same situation when service is stopped and salt-call enforces the start:
[INFO ] Executing command ['service', 'mysql', 'status'] in directory '/root'
[ERROR ] Command ['service', 'mysql', 'status'] failed with return code: 3
[ERROR ] output: * MySQL Community Server 5.7.12 is not running
[INFO ] Executing command ['service', 'mysql', 'start'] in directory '/root'
@Ch3LL To reproduce use the following SLS
root@db01:/srv/salt# cat test.sls
mysql_repo:
pkgrepo.managed:
- names:
- deb http://repo.mysql.com/apt/ubuntu/ trusty mysql-5.7
- file: /etc/apt/sources.list.d/mysql.list
- keyserver: pgp.mit.edu
- keyid: 5072E1F5
- enabled: True
- refresh_db: True
mysql_server_debconf:
debconf.set:
- name: mysql-community-server
- data:
'mysql-community-server/root-pass': {'type': 'password', 'value': 'root'}
'mysql-community-server/re-root-pass': {'type': 'password', 'value': 'root'}
mysql_server_packages:
pkg.installed:
- names:
- mysql-server
- require:
- debconf: mysql_server_debconf
This may be related: https://github.com/saltstack/salt/issues/30255
Another related issue https://github.com/saltstack/salt/issues/16153
What's weird about this is that when I just do ...
service mysql restart
... in shell, it works fine, but when salt does that - it hangs the salt!
Edit: Having this on all servers with mysql server community edition installed from official MySQL (Oracle) repo. Edit Edit: @Ch3LL have you been able to replicate issue with the state I provided ?
another related issue: https://github.com/saltstack/salt/issues/9736
@zerthimon i'm wondering if you are actually running into this issue #36116 as the issue states looks likethe service files for mysql have changed which has caused a hanging issue when trying to start the mysql community server service
@Ch3LL I have solved this problem for myself by changing the mysql init.d file:
I had to replace 2 lines (original lines are commented) to use debian's start-stop-daemon
instead of su
:
# su - mysql -s /bin/bash -c "/usr/sbin/mysqld --log_error_verbosity=2 2>&1 > /dev/null &"
start-stop-daemon --start --quiet --background --user mysql --exec /usr/sbin/mysqld
and:
# su - mysql -s /bin/bash -c "mysqld_safe > /dev/null &"
start-stop-daemon --start --quiet --background --exec /usr/bin/mysqld_safe
In my sls state I ensure that init.d file is created BEFORE the mysql community package is installed.
P.S.:
So this seems to me, that su
together with bash
cause the problem ... I'm not sure what's exactly happening there (signals forwarding ? leaked descriptors ? tty problems ?).....
EDIT:
There is a small utility https://github.com/tianon/gosu which is used in building docker containers and is made to overcome the problems caused by su
. Just thought it may be solving the same problem we have.
Also, actually, another workaround: https://github.com/saltstack/salt/issues/9736#issuecomment-176351724 by @michalsuba which is excellent and shows that it's a problem of a leaked file descriptor.
Now there is a question, can salt detect such cases and not hang ?
Awesome thanks for linking all these issues. Great work! Since it does indeed seem like #9736 is a duplciate do you mind if we close this here and track the issue there so we aren't managing multiple issues? If you have any further comments regarding the issue please comment on that issue if you agree this is a duplicate.
@Ch3LL yes this issue can be closed, however, my question still remains: can salt be taught to deal with such sutiation ?
I'm not entirely sure on that answer or how we would handle that on our end but I would like to ping @dmurphy18 . Is this your area of expertise @dmurphy18 ? Do you know if this is something we can handle on salts side of things with how mysql is changing in their startup files?
@Ch3LL Do not think we can handle mysql changing their startup files, since it would be an endless task chasing every release. If there is an error in their init files, as suggested by dropping resources (leaked file descriptor), then the best suggestion is to file an upstream bug with them to fix it.
@dmurphy18 in my experience this occurs for couple of services (for example, java ones) that have init.d scripts not on par with ones by distro maintainers.
While I agree that services doing it wrong should fix in upstream, would it be possible to have a workaround? For example, have a parameter like do_not_wait
or detach
in either service
or cmd
?
Using nohup
did not work for me in cmd
when using brute service mysqld start
, and as far as I remember deb
packages attempt to start services on instllation as opposed to rpm
.
It also makes (some) sense for cmd
.
So I'm not a user of salt, but found this issue while investigating an issue where I'm experiencing the same thing in an inline bash provisioner on a vagrant box. This helped me narrow down a reproducible test case, and I'm posting it here in case it helps others.
Running as root…
service mysqld stop
service mysqld start 2>&1 | cat
If I remove either the cat
or the stderr redirect, the issue doesn't occur. If I alter /etc/init.d/mysqld
to revert the change @mbochenk noted in #36116 had been made in mysql/mysql-server@684a165#diff-26228c60f7753d111a177fa638fb5838R283
The real life scenario in my case is more complicated than just running the output through cat, but it appears that any attempt to run the service call in a way which captures the stderr output into an stdin descriptor triggers the issue.
EDIT: Another change which works around the (apparent) leaked descriptor issues changing the start call in my test case to the following:
service mysqld start 2> >(cat)
This solved the issue for me since I'm essentially just needing to silence a bogus error message it spits out due the NFS mount we're running in Vagrant for the purpose of storing the mysql data on the host machine.
@davegiles thanks. I did not think of redirecting service output in cmd
, need to try that out.
This will not work for apt-get mysql though - unless not installed via pkg
.
See [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214643] [Bug 214643] net-mgmt/telegraf: service (re)start hangs when scripted: missing option -f with daemon(8)
I expect we are seeing the same issue here.
To quote the essential summary from that PR:
I came across this issue when trying to manage a service 'telegraf' with SaltStack (i.e. sysutils/py-salt). I would expect the same problem can occur under other management tools like Puppet/Chef/Ansible.
What happens is that when a salt-minion tries to start or restart a telegraf process by spawning a command: 'service telegraf restart', the process just hangs, waiting for a restart to happen, even though the telegraf process does restart successfully and is again running underneath. [...] $ service telegraf restart | cat (hangs) [...] So what happens is:
- The process executing the daemon(8) program (as started by the 'service' command) has its stdout directed to a pipe;
- The daemon(8) program disassociates from a controlling terminal but DOES NOT close its stdout, and then forks;
- The forked daemon(8) subprocess inherits the open fd to a pipe;
- The parent daemon(8) then exits, but the child daemon(8) still has a pipe open, connected to cat(1) (or to salt-minion);
- the child daemon(8) process then exec's the telegraf program, which again inherits the pipe as its stdout.
On the SaltStack side I suppose the workaround is to provide an option to the cmd module to avoid it connecting a pipe to stdout and stderr of the spawned process.
To make myself explicit: services which behave as described above (a process terminates but passes its descriptors to other processes before exiting) may indeed not be behaving as nice citizens, but the bug lies entirely in SaltStack.
When SaltStack spawns some process (i.e. runs a command) and creates a pipe to its stdout and stderr, SaltStack must establish a SIGCHLD handler. When the spawned process terminates, any I/O on such pipes must be terminated (even if the pipe is still open on the other side). Promptly reclaiming (waitpid) the exit status of a terminated child process also avoids having zombies lying around.
Hey all,
Been struggling with the issue here when trying to start a process via a custom service file on Suse 11. Salt was hanging on service.running
state where my start code was /sbin/startproc $BIN
. I finally solved it using the -q flag on startproc /sbin/startproc -q $BIN
.
I think the root cause for me was that the binary I am starting keeps a STDOUT session open and even though doing a manual service BIN start
was exiting correctly, calling the state via salt was not.
Please remove the 'Info Needed' blocking label. The issue has been explained and necessary information provided.
Thanks for everyones investigation here. There is this issue here https://github.com/saltstack/salt/issues/9736 and there is discussion there about adding an additional parameter to help workaround this issue. I'm going to close this issue so we can track it there. If anyone has any ideas about the additional parameter please discuess there. Thanks
Description of Issue/Question
salt-call hangs on the following command:
in processes:
in strace:
in lsof:
Setup
Steps to Reproduce Issue
Hangs forever, the install of the package is finished in the background, the DB is initialized. apt-get is a zombie.
Versions Report