Closed mruepp closed 4 years ago
Output on the master:
[INFO ] Executing state salt.function for [pkg.upgrade]
[ERROR ] No changes made for pkg.upgrade
ID: upgrade_all_packages
Function: salt.function
Name: pkg.upgrade
Result: False
Comment:
Started: 11:17:44.752874
Duration: 81940.991 ms
Changes:
Event output:
"salt_|-upgrade_all_packages_|-pkg.upgrade_|-function": {
"__id__": "upgrade_all_packages",
"__run_num__": 4,
"__sls__": "orchestrate.prime",
"changes": {},
"command": "No minions responded",
"comment": "",
"duration": 81940.991,
"name": "pkg.upgrade",
"result": false,
"start_time": "11:17:44.752874"
},
At the end, somewhat 2 minutes after the orchestrator returns failed, we see the update events with success true:
salt/job/20181017111744866510/ret/devs0242 {
"_stamp": "2018-10-17T09:21:20.817618",
"cmd": "_return",
"fun": "pkg.upgrade",
"fun_args": [],
"id": "devs0242",
"jid": "20181017111744866510",
"master_id": "salt.dev.sc.intra",
"retcode": 0,
"return": {
"NetworkManager": {
"new": "1:1.10.2-16.el7_5",
"old": "1:1.10.2-13.el7"
},
...
"tzdata": {
"new": "2018e-3.el7",
"old": "2018c-1.el7"
},
"util-linux": {
"new": "2.23.2-52.el7_5.1",
"old": "2.23.2-52.el7"
},
"vdo": {
"new": "6.1.0.168-18",
"old": "6.1.0.149-16"
},
"yum-plugin-fastestmirror": {
"new": "1.1.31-46.el7_5",
"old": "1.1.31-45.el7"
},
"yum-utils": {
"new": "1.1.31-46.el7_5",
"old": "1.1.31-45.el7"
}
},
"success": true
}
Interestingly, the state duration is always 66600ms [INFO ] Executing state salt.function for [pkg.upgrade] [ERROR ] No changes made for pkg.upgrade [INFO ] Completed state [pkg.upgrade] at time 13:53:02.655923 (duration_in_ms=66612.566)
Also not possible to run the state pkg.uptodate without failure. This is a real blocker:
[INFO ] Running state [upgrade_all_packages] at time 14:11:15.832877
[INFO ] Executing state salt.state for [upgrade_all_packages]
[WARNING ] Output from salt state not highstate
[ERROR ] {u'ret': {'devs0242': False}, u'out': u'highstate'}
[INFO ] Completed state [upgrade_all_packages] at time 14:12:27.605108 (duration_in_ms=71772.231)
Can you provide a full list of packages that are being upgraded?
I have been able to replicate this, i just want to cross reference what is being upgraded.
I think there may be a package like iptables or something that is causing the minion to be disconnected and not reconnect, so that state.orch things that the minion is down and never returning, until the reconnect happens after the upgrade finishes.
Hi, i managed now to run the state successfully like so:
CentOS.upgrade:
allpkgsuptodate:
pkg.uptodate:
- refresh: True
The orchestrator:
upgrade_all_packages:
salt.state:
- tgt: {{ salt['pillar.get']('servers', '') }}
- sls:
- CentOS.upgrade
- saltenv: {{ saltenv }}
- timeout: 400
Also, the next state will only run, if I put a:
sleep_primer_upgrade:
salt.runner:
- name: test.sleep
- arg:
- 120
Otherwise it will return "Minion did not respond"
Here the pkgs:
ID: allpkgsuptodate
Function: pkg.uptodate
Result: True
Comment: Upgrade ran successfully
Started: 16:08:31.359253
Duration: 360442.874 ms
Changes:
----------
NetworkManager:
----------
new:
1:1.10.2-16.el7_5
old:
1:1.10.2-13.el7
NetworkManager-libnm:
----------
new:
1:1.10.2-16.el7_5
old:
1:1.10.2-13.el7
NetworkManager-team:
----------
new:
1:1.10.2-16.el7_5
old:
1:1.10.2-13.el7
NetworkManager-tui:
----------
new:
1:1.10.2-16.el7_5
old:
1:1.10.2-13.el7
audit:
----------
new:
2.8.1-3.el7_5.1
old:
2.8.1-3.el7
audit-libs:
----------
new:
2.8.1-3.el7_5.1
old:
2.8.1-3.el7
augeas-libs:
----------
new:
1.4.0-5.el7_5.1
old:
1.4.0-5.el7
bind-libs:
----------
new:
32:9.9.4-61.el7_5.1
old:
32:9.9.4-61.el7
bind-libs-lite:
----------
new:
32:9.9.4-61.el7_5.1
old:
32:9.9.4-61.el7
bind-license:
----------
new:
32:9.9.4-61.el7_5.1
old:
32:9.9.4-61.el7
bind-utils:
----------
new:
32:9.9.4-61.el7_5.1
old:
32:9.9.4-61.el7
binutils:
----------
new:
2.27-28.base.el7_5.1
old:
2.27-27.base.el7
ca-certificates:
----------
new:
2018.2.22-70.0.el7_5
old:
2017.2.20-71.el7
centos-release:
----------
new:
7-5.1804.5.el7.centos
old:
7-5.1804.el7.centos
dhclient:
----------
new:
12:4.2.5-68.el7.centos.1
old:
12:4.2.5-68.el7.centos
dhcp-common:
----------
new:
12:4.2.5-68.el7.centos.1
old:
12:4.2.5-68.el7.centos
dhcp-libs:
----------
new:
12:4.2.5-68.el7.centos.1
old:
12:4.2.5-68.el7.centos
dracut:
----------
new:
033-535.el7_5.1
old:
033-535.el7
dracut-config-rescue:
----------
new:
033-535.el7_5.1
old:
033-535.el7
dracut-network:
----------
new:
033-535.el7_5.1
old:
033-535.el7
e2fsprogs:
----------
new:
1.42.9-12.el7_5
old:
1.42.9-11.el7
e2fsprogs-libs:
----------
new:
1.42.9-12.el7_5
old:
1.42.9-11.el7
firewalld:
----------
new:
0.4.4.4-15.el7_5
old:
0.4.4.4-14.el7
firewalld-filesystem:
----------
new:
0.4.4.4-15.el7_5
old:
0.4.4.4-14.el7
gnupg2:
----------
new:
2.0.22-5.el7_5
old:
2.0.22-4.el7
initscripts:
----------
new:
9.49.41-1.el7_5.2
old:
9.49.41-1.el7
iptables:
----------
new:
1.4.21-24.1.el7_5
old:
1.4.21-24.el7
iwl100-firmware:
----------
new:
39.31.5.1-62.2.el7_5
old:
39.31.5.1-62.el7
iwl1000-firmware:
----------
new:
1:39.31.5.1-62.2.el7_5
old:
1:39.31.5.1-62.el7
iwl105-firmware:
----------
new:
18.168.6.1-62.2.el7_5
old:
18.168.6.1-62.el7
iwl135-firmware:
----------
new:
18.168.6.1-62.2.el7_5
old:
18.168.6.1-62.el7
iwl2000-firmware:
----------
new:
18.168.6.1-62.2.el7_5
old:
18.168.6.1-62.el7
iwl2030-firmware:
----------
new:
18.168.6.1-62.2.el7_5
old:
18.168.6.1-62.el7
iwl3160-firmware:
----------
new:
22.0.7.0-62.2.el7_5
old:
22.0.7.0-62.el7
iwl3945-firmware:
----------
new:
15.32.2.9-62.2.el7_5
old:
15.32.2.9-62.el7
iwl4965-firmware:
----------
new:
228.61.2.24-62.2.el7_5
old:
228.61.2.24-62.el7
iwl5000-firmware:
----------
new:
8.83.5.1_1-62.2.el7_5
old:
8.83.5.1_1-62.el7
iwl5150-firmware:
----------
new:
8.24.2.2-62.2.el7_5
old:
8.24.2.2-62.el7
iwl6000-firmware:
----------
new:
9.221.4.1-62.2.el7_5
old:
9.221.4.1-62.el7
iwl6000g2a-firmware:
----------
new:
17.168.5.3-62.2.el7_5
old:
17.168.5.3-62.el7
iwl6000g2b-firmware:
----------
new:
17.168.5.2-62.2.el7_5
old:
17.168.5.2-62.el7
iwl6050-firmware:
----------
new:
41.28.5.1-62.2.el7_5
old:
41.28.5.1-62.el7
iwl7260-firmware:
----------
new:
22.0.7.0-62.2.el7_5
old:
22.0.7.0-62.el7
iwl7265-firmware:
----------
new:
22.0.7.0-62.2.el7_5
old:
22.0.7.0-62.el7
kernel:
----------
new:
3.10.0-862.el7,3.10.0-862.14.4.el7
old:
3.10.0-862.el7
kernel-tools:
----------
new:
3.10.0-862.14.4.el7
old:
3.10.0-862.el7
kernel-tools-libs:
----------
new:
3.10.0-862.14.4.el7
old:
3.10.0-862.el7
kexec-tools:
----------
new:
2.0.15-13.el7_5.2
old:
2.0.15-13.el7
kmod-kvdo:
----------
new:
6.1.0.181-17.el7_5
old:
6.1.0.153-15.el7
kpartx:
----------
new:
0.4.9-119.el7_5.1
old:
0.4.9-119.el7
krb5-libs:
----------
new:
1.15.1-19.el7
old:
1.15.1-18.el7
libblkid:
----------
new:
2.23.2-52.el7_5.1
old:
2.23.2-52.el7
libcom_err:
----------
new:
1.42.9-12.el7_5
old:
1.42.9-11.el7
libgcc:
----------
new:
4.8.5-28.el7_5.1
old:
4.8.5-28.el7
libgomp:
----------
new:
4.8.5-28.el7_5.1
old:
4.8.5-28.el7
libmount:
----------
new:
2.23.2-52.el7_5.1
old:
2.23.2-52.el7
libss:
----------
new:
1.42.9-12.el7_5
old:
1.42.9-11.el7
libsss_idmap:
----------
new:
1.16.0-19.el7_5.8
old:
1.16.0-19.el7
libsss_nss_idmap:
----------
new:
1.16.0-19.el7_5.8
old:
1.16.0-19.el7
libstdc++:
----------
new:
4.8.5-28.el7_5.1
old:
4.8.5-28.el7
libtomcrypt:
----------
new:
1.17-26.el7
old:
libtommath:
----------
new:
0.42.0-6.el7
old:
libuuid:
----------
new:
2.23.2-52.el7_5.1
old:
2.23.2-52.el7
linux-firmware:
----------
new:
20180220-62.2.git6d51311.el7_5
old:
20180220-62.git6d51311.el7
mariadb-libs:
----------
new:
1:5.5.60-1.el7_5
old:
1:5.5.56-2.el7
microcode_ctl:
----------
new:
2:2.1-29.16.el7_5
old:
2:2.1-29.el7
nspr:
----------
new:
4.19.0-1.el7_5
old:
4.17.0-1.el7
nss:
----------
new:
3.36.0-7.el7_5
old:
3.34.0-4.el7
nss-softokn:
----------
new:
3.36.0-5.el7_5
old:
3.34.0-2.el7
nss-softokn-freebl:
----------
new:
3.36.0-5.el7_5
old:
3.34.0-2.el7
nss-sysinit:
----------
new:
3.36.0-7.el7_5
old:
3.34.0-4.el7
nss-tools:
----------
new:
3.36.0-7.el7_5
old:
3.34.0-4.el7
nss-util:
----------
new:
3.36.0-1.el7_5
old:
3.34.0-2.el7
open-vm-tools:
----------
new:
10.1.10-3.el7_5.1
old:
10.1.10-3.el7
openldap:
----------
new:
2.4.44-15.el7_5
old:
2.4.44-13.el7
procps-ng:
----------
new:
3.3.10-17.el7_5.2
old:
3.3.10-17.el7
python:
----------
new:
2.7.5-69.el7_5
old:
2.7.5-68.el7
python-crypto:
----------
new:
old:
2.6.1-2.el7
python-firewall:
----------
new:
0.4.4.4-15.el7_5
old:
0.4.4.4-14.el7
python-libs:
----------
new:
2.7.5-69.el7_5
old:
2.7.5-68.el7
python-perf:
----------
new:
3.10.0-862.14.4.el7
old:
3.10.0-862.el7
python2-crypto:
----------
new:
2.6.1-15.el7
old:
rsyslog:
----------
new:
8.24.0-16.el7_5.4
old:
8.24.0-16.el7
selinux-policy:
----------
new:
3.13.1-192.el7_5.6
old:
3.13.1-192.el7
selinux-policy-targeted:
----------
new:
3.13.1-192.el7_5.6
old:
3.13.1-192.el7
sos:
----------
new:
3.5-9.el7.centos
old:
3.5-6.el7.centos
sssd-client:
----------
new:
1.16.0-19.el7_5.8
old:
1.16.0-19.el7
sudo:
----------
new:
1.8.19p2-14.el7_5
old:
1.8.19p2-13.el7
systemd:
----------
new:
219-57.el7_5.3
old:
219-57.el7
systemd-libs:
----------
new:
219-57.el7_5.3
old:
219-57.el7
systemd-python:
----------
new:
219-57.el7_5.3
old:
219-57.el7
systemd-sysv:
----------
new:
219-57.el7_5.3
old:
219-57.el7
systemtap-runtime:
----------
new:
3.2-8.el7_5
old:
3.2-4.el7
tuned:
----------
new:
2.9.0-1.el7_5.2
old:
2.9.0-1.el7
tzdata:
----------
new:
2018e-3.el7
old:
2018c-1.el7
util-linux:
----------
new:
2.23.2-52.el7_5.1
old:
2.23.2-52.el7
vdo:
----------
new:
6.1.0.168-18
old:
6.1.0.149-16
yum-plugin-fastestmirror:
----------
new:
1.1.31-46.el7_5
old:
1.1.31-45.el7
yum-utils:
----------
new:
1.1.31-46.el7_5
old:
1.1.31-45.el7
Summary for devs0242
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 360.443 s
yeah, it seems like one of those packages, possibly NetworkManager or iptables is causing the minion to be disconnected from the master publish bus, so that it is still able to connect and return, but has to wait for the minion reconnect event to happen on the publish bus for the second command to be sent through.
Unfortunately I am having a hard time replicating this issue, are you able to go through and figure out if it is one specific package that is causing this disconnect?
Thanks, Daniel
I would bet money on NetworkManager restarting the network on an update, causing the disconnect of the minion to the masters publish bus, which means the minion will stop responding to the saltutil.is_running
requests that the LocalClient uses to know that the minions are still running jobs, and eventually this leads to the "timeout" and Minion not responding
message
Try skipping the NetworkManager upgrade, and see if you still see this problem, and instead upgrade NetworkManager separately.
Can be but the problem is, that we use it in a primer orchestrator which runs on minion accept. We can not determine, what parts of the os will be upgraded. So this inconsistent behaviour is something, in my opinion, salt should address. So we used timeout in the state and its fine by now
@saltstack/team-core can I get some opinions on this?
Thanks Daniel
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Description of Issue/Question
We run an orchestrator which ultimately ends in a
The orchestrator runs fine but when we upgrade quite a bit of packages, it finishes with a failed state of the both. What really bothers is the upgrade state, because, when watching the events with:
salt-run state.event pretty=True
the events continue after the return of the states failed, after some time the updates and the reboot happens, despite the states failed.
So somehow the ret will happen despite the state is not finished but returns failed despite it is still running the upgrade.
When running orchestration it is expected to run all the states in a defined order, and wait til the states finish for sure.
We tried increasing the
tried to apply a sleep state after the upgrade state, no success.
So why does the pkg.upgrade return failed to the event system despite it is not finished, and after all finish successfully, but returns failed?
Btw, we recognized also a lot of
Minion did not return. [No response]
when applying the cmd on the salt master with:salt 'minion' pgk.upgrade
as well. If you apply the state a second time, e.g. the process does not take so much time, it returns successfullyIts just an ordinary centos system, upgrading from 1803 to 1804. Somewhat about 250MB and about 3-5min when issuing
yum upgrade
Versions Report
Versions Report
salt-minion --versions-report Salt Version: Salt: 2018.3.2
Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: Not Installed docker-py: 1.10.6 gitdb: Not Installed gitpython: Not Installed ioflo: Not Installed Jinja2: 2.7.2 libgit2: Not Installed libnacl: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: Not Installed pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: Not Installed Python: 2.7.5 (default, Jul 13 2018, 13:06:57) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4
System Versions: dist: centos 7.5.1804 Core locale: UTF-8 machine: x86_64 release: 3.10.0-862.11.6.el7.x86_64 system: Linux version: CentOS Linux 7.5.1804 Core
salt --versions-report Salt Version: Salt: 2018.3.2
Dependency Versions: cffi: 1.6.0 cherrypy: Not Installed dateutil: Not Installed docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed ioflo: Not Installed Jinja2: 2.7.2 libgit2: 0.26.3 libnacl: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: 2.14 pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: 0.26.4 Python: 2.7.5 (default, Jul 13 2018, 13:06:57) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4
System Versions: dist: centos 7.5.1804 Core locale: UTF-8 machine: x86_64 release: 3.10.0-862.14.4.el7.x86_64 system: Linux version: CentOS Linux 7.5.1804 Core