saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

pkg.upgrade does not return (or return to fast) #50084

Closed mruepp closed 4 years ago

mruepp commented 6 years ago

Description of Issue/Question

We run an orchestrator which ultimately ends in a

upgrade-packages:
  salt.function:
    - name: pkg.upgrade
    - tgt: {{ somepillar }}
    - saltenv: {{ somesaltenv }}

reboot-system:
  salt.function:
    - name: system.reboot
    - ...

The orchestrator runs fine but when we upgrade quite a bit of packages, it finishes with a failed state of the both. What really bothers is the upgrade state, because, when watching the events with:

salt-run state.event pretty=True

the events continue after the return of the states failed, after some time the updates and the reboot happens, despite the states failed.

So somehow the ret will happen despite the state is not finished but returns failed despite it is still running the upgrade.

When running orchestration it is expected to run all the states in a defined order, and wait til the states finish for sure.

We tried increasing the

timeout:
gather_job_timeout: 

tried to apply a sleep state after the upgrade state, no success.

So why does the pkg.upgrade return failed to the event system despite it is not finished, and after all finish successfully, but returns failed?

Btw, we recognized also a lot of Minion did not return. [No response] when applying the cmd on the salt master with: salt 'minion' pgk.upgrade as well. If you apply the state a second time, e.g. the process does not take so much time, it returns successfully

Its just an ordinary centos system, upgrading from 1803 to 1804. Somewhat about 250MB and about 3-5min when issuing yum upgrade

Versions Report

Versions Report

salt-minion --versions-report Salt Version: Salt: 2018.3.2

Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: Not Installed docker-py: 1.10.6 gitdb: Not Installed gitpython: Not Installed ioflo: Not Installed Jinja2: 2.7.2 libgit2: Not Installed libnacl: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: Not Installed pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: Not Installed Python: 2.7.5 (default, Jul 13 2018, 13:06:57) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4

System Versions: dist: centos 7.5.1804 Core locale: UTF-8 machine: x86_64 release: 3.10.0-862.11.6.el7.x86_64 system: Linux version: CentOS Linux 7.5.1804 Core

salt --versions-report Salt Version: Salt: 2018.3.2

Dependency Versions: cffi: 1.6.0 cherrypy: Not Installed dateutil: Not Installed docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed ioflo: Not Installed Jinja2: 2.7.2 libgit2: 0.26.3 libnacl: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: 2.14 pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: 0.26.4 Python: 2.7.5 (default, Jul 13 2018, 13:06:57) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4

System Versions: dist: centos 7.5.1804 Core locale: UTF-8 machine: x86_64 release: 3.10.0-862.14.4.el7.x86_64 system: Linux version: CentOS Linux 7.5.1804 Core

mruepp commented 6 years ago

Output on the master:

[INFO    ] Executing state salt.function for [pkg.upgrade]
[ERROR   ] No changes made for pkg.upgrade
         ID: upgrade_all_packages
    Function: salt.function
        Name: pkg.upgrade
      Result: False
     Comment:
     Started: 11:17:44.752874
    Duration: 81940.991 ms
     Changes:

Event output:

              "salt_|-upgrade_all_packages_|-pkg.upgrade_|-function": {
                    "__id__": "upgrade_all_packages",
                    "__run_num__": 4,
                    "__sls__": "orchestrate.prime",
                    "changes": {},
                    "command": "No minions responded",
                    "comment": "",
                    "duration": 81940.991,
                    "name": "pkg.upgrade",
                    "result": false,
                    "start_time": "11:17:44.752874"
                },

At the end, somewhat 2 minutes after the orchestrator returns failed, we see the update events with success true:

salt/job/20181017111744866510/ret/devs0242      {
    "_stamp": "2018-10-17T09:21:20.817618",
    "cmd": "_return",
    "fun": "pkg.upgrade",
    "fun_args": [],
    "id": "devs0242",
    "jid": "20181017111744866510",
    "master_id": "salt.dev.sc.intra",
    "retcode": 0,
    "return": {
        "NetworkManager": {
            "new": "1:1.10.2-16.el7_5",
            "old": "1:1.10.2-13.el7"
        },
...
        "tzdata": {                         
            "new": "2018e-3.el7",           
            "old": "2018c-1.el7"            
        },                                  
        "util-linux": {                     
            "new": "2.23.2-52.el7_5.1",     
            "old": "2.23.2-52.el7"          
        },                                  
        "vdo": {                            
            "new": "6.1.0.168-18",          
            "old": "6.1.0.149-16"           
        },                                  
        "yum-plugin-fastestmirror": {       
            "new": "1.1.31-46.el7_5",       
            "old": "1.1.31-45.el7"          
        },                                  
        "yum-utils": {                      
            "new": "1.1.31-46.el7_5",       
            "old": "1.1.31-45.el7"          
        }                                   
    },                                      
    "success": true                         
}                                           
mruepp commented 6 years ago

Interestingly, the state duration is always 66600ms [INFO ] Executing state salt.function for [pkg.upgrade] [ERROR ] No changes made for pkg.upgrade [INFO ] Completed state [pkg.upgrade] at time 13:53:02.655923 (duration_in_ms=66612.566)

mruepp commented 6 years ago

Also not possible to run the state pkg.uptodate without failure. This is a real blocker:

[INFO    ] Running state [upgrade_all_packages] at time 14:11:15.832877
[INFO    ] Executing state salt.state for [upgrade_all_packages]
[WARNING ] Output from salt state not highstate
[ERROR   ] {u'ret': {'devs0242': False}, u'out': u'highstate'}
[INFO    ] Completed state [upgrade_all_packages] at time 14:12:27.605108 (duration_in_ms=71772.231)
gtmanfred commented 6 years ago

Can you provide a full list of packages that are being upgraded?

I have been able to replicate this, i just want to cross reference what is being upgraded.

I think there may be a package like iptables or something that is causing the minion to be disconnected and not reconnect, so that state.orch things that the minion is down and never returning, until the reconnect happens after the upgrade finishes.

mruepp commented 6 years ago

Hi, i managed now to run the state successfully like so:

CentOS.upgrade:

allpkgsuptodate:
  pkg.uptodate:
    - refresh: True

The orchestrator:

upgrade_all_packages:
  salt.state:
    - tgt: {{ salt['pillar.get']('servers', '') }}
    - sls:
      - CentOS.upgrade
    - saltenv: {{ saltenv }}
    - timeout: 400

Also, the next state will only run, if I put a:

sleep_primer_upgrade:
  salt.runner:
    - name: test.sleep
    - arg: 
      - 120

Otherwise it will return "Minion did not respond"

Here the pkgs:

ID: allpkgsuptodate
                  Function: pkg.uptodate
                    Result: True
                   Comment: Upgrade ran successfully
                   Started: 16:08:31.359253
                  Duration: 360442.874 ms
                   Changes:   
                            ----------
                            NetworkManager:
                                ----------
                                new:
                                    1:1.10.2-16.el7_5
                                old:
                                    1:1.10.2-13.el7
                            NetworkManager-libnm:
                                ----------
                                new:
                                    1:1.10.2-16.el7_5
                                old:
                                    1:1.10.2-13.el7
                            NetworkManager-team:
                                ----------
                                new:
                                    1:1.10.2-16.el7_5
                                old:
                                    1:1.10.2-13.el7
                            NetworkManager-tui:
                                ----------
                                new:
                                    1:1.10.2-16.el7_5
                                old:
                                    1:1.10.2-13.el7
                            audit:
                                ----------
                                new:
                                    2.8.1-3.el7_5.1
                                old:
                                    2.8.1-3.el7
                            audit-libs:
                                ----------
                                new:
                                    2.8.1-3.el7_5.1
                                old:
                                    2.8.1-3.el7
                            augeas-libs:
                                ----------
                                new:
                                    1.4.0-5.el7_5.1
                                old:
                                    1.4.0-5.el7
                            bind-libs:
                                ----------
                                new:
                                    32:9.9.4-61.el7_5.1
                                old:
                                    32:9.9.4-61.el7
                            bind-libs-lite:
                                ----------
                                new:
                                    32:9.9.4-61.el7_5.1
                                old:
                                    32:9.9.4-61.el7
                            bind-license:
                                ----------
                                new:
                                    32:9.9.4-61.el7_5.1
                                old:
                                    32:9.9.4-61.el7
                            bind-utils:
                                ----------
                                new:
                                    32:9.9.4-61.el7_5.1
                                old:
                                    32:9.9.4-61.el7
                            binutils:
                                ----------
                                new:
                                    2.27-28.base.el7_5.1
                                old:
                                    2.27-27.base.el7
                            ca-certificates:
                                ----------
                                new:
                                    2018.2.22-70.0.el7_5
                                old:
                                    2017.2.20-71.el7
                            centos-release:
                                ----------
                                new:
                                    7-5.1804.5.el7.centos
                                old:
                                    7-5.1804.el7.centos
                            dhclient:
                                ----------
                                new:
                                    12:4.2.5-68.el7.centos.1
                                old:
                                    12:4.2.5-68.el7.centos
                            dhcp-common:
                                ----------
                                new:
                                    12:4.2.5-68.el7.centos.1
                                old:
                                    12:4.2.5-68.el7.centos
                            dhcp-libs:
                                ----------
                                new:
                                    12:4.2.5-68.el7.centos.1
                                old:
                                    12:4.2.5-68.el7.centos
                            dracut:
                                ----------
                                new:
                                    033-535.el7_5.1
                                old:
                                    033-535.el7
                            dracut-config-rescue:
                                ----------
                                new:
                                    033-535.el7_5.1
                                old:
                                    033-535.el7
                            dracut-network:
                                ----------
                                new:
                                    033-535.el7_5.1
                                old:
                                    033-535.el7
                            e2fsprogs:
                                ----------
                                new:
                                    1.42.9-12.el7_5
                                old:
                                    1.42.9-11.el7
                            e2fsprogs-libs:
                                ----------
                                new:
                                    1.42.9-12.el7_5
                                old:
                                    1.42.9-11.el7
                            firewalld:
                                ----------
                                new:
                                    0.4.4.4-15.el7_5
                                old:
                                    0.4.4.4-14.el7
                            firewalld-filesystem:
                                ----------
                                new:
                                    0.4.4.4-15.el7_5
                                old:
                                    0.4.4.4-14.el7
                            gnupg2:
                                ----------
                                new:
                                    2.0.22-5.el7_5
                                old:
                                    2.0.22-4.el7
                            initscripts:
                                ----------
                                new:
                                    9.49.41-1.el7_5.2
                                old:
                                    9.49.41-1.el7
                            iptables:
                                ----------
                                new:
                                    1.4.21-24.1.el7_5
                                old:
                                    1.4.21-24.el7
                            iwl100-firmware:
                                ----------
                                new:
                                    39.31.5.1-62.2.el7_5
                                old:
                                    39.31.5.1-62.el7
                            iwl1000-firmware:
                                ----------
                                new:
                                    1:39.31.5.1-62.2.el7_5
                                old:
                                    1:39.31.5.1-62.el7
                            iwl105-firmware:
                                ----------
                                new:
                                    18.168.6.1-62.2.el7_5
                                old:
                                    18.168.6.1-62.el7
                            iwl135-firmware:
                                ----------
                                new:
                                    18.168.6.1-62.2.el7_5
                                old:
                                    18.168.6.1-62.el7
                            iwl2000-firmware:
                                ----------
                                new:
                                    18.168.6.1-62.2.el7_5
                                old:
                                    18.168.6.1-62.el7
                            iwl2030-firmware:
                                ----------
                                new:
                                    18.168.6.1-62.2.el7_5
                                old:
                                    18.168.6.1-62.el7
                            iwl3160-firmware:
                                ----------
                                new:
                                    22.0.7.0-62.2.el7_5
                                old:
                                    22.0.7.0-62.el7
                            iwl3945-firmware:
                                ----------
                                new:
                                    15.32.2.9-62.2.el7_5
                                old:
                                    15.32.2.9-62.el7
                            iwl4965-firmware:
                                ----------
                                new:
                                    228.61.2.24-62.2.el7_5
                                old:
                                    228.61.2.24-62.el7
                            iwl5000-firmware:
                                ----------
                                new:
                                    8.83.5.1_1-62.2.el7_5
                                old:
                                    8.83.5.1_1-62.el7
                            iwl5150-firmware:
                                ----------
                                new:
                                    8.24.2.2-62.2.el7_5
                                old:
                                    8.24.2.2-62.el7
                            iwl6000-firmware:
                                ----------
                                new:
                                    9.221.4.1-62.2.el7_5
                                old:
                                    9.221.4.1-62.el7
                            iwl6000g2a-firmware:
                                ----------
                                new:
                                    17.168.5.3-62.2.el7_5
                                old:
                                    17.168.5.3-62.el7
                            iwl6000g2b-firmware:
                                ----------
                                new:
                                    17.168.5.2-62.2.el7_5
                                old:
                                    17.168.5.2-62.el7
                            iwl6050-firmware:
                                ----------
                                new:
                                    41.28.5.1-62.2.el7_5
                                old:
                                    41.28.5.1-62.el7
                            iwl7260-firmware:
                                ----------
                                new:
                                    22.0.7.0-62.2.el7_5
                                old:
                                    22.0.7.0-62.el7
                            iwl7265-firmware:
                                ----------
                                new:
                                    22.0.7.0-62.2.el7_5
                                old:
                                    22.0.7.0-62.el7
                            kernel:
                                ----------
                                new:
                                    3.10.0-862.el7,3.10.0-862.14.4.el7
                                old:
                                    3.10.0-862.el7
                            kernel-tools:
                                ----------
                                new:
                                    3.10.0-862.14.4.el7
                                old:
                                    3.10.0-862.el7
                            kernel-tools-libs:
                                ----------
                                new:
                                    3.10.0-862.14.4.el7
                                old:
                                    3.10.0-862.el7
                            kexec-tools:
                                ----------
                                new:
                                    2.0.15-13.el7_5.2
                                old:
                                    2.0.15-13.el7
                            kmod-kvdo:
                                ----------
                                new:
                                    6.1.0.181-17.el7_5
                                old:
                                    6.1.0.153-15.el7
                            kpartx:
                                ----------
                                new:
                                    0.4.9-119.el7_5.1
                                old:
                                    0.4.9-119.el7
                            krb5-libs:
                                ----------
                                new:
                                    1.15.1-19.el7
                                old:
                                    1.15.1-18.el7
                            libblkid:
                                ----------
                                new:
                                    2.23.2-52.el7_5.1
                                old:
                                    2.23.2-52.el7
                            libcom_err:
                                ----------
                                new:
                                    1.42.9-12.el7_5
                                old:
                                    1.42.9-11.el7
                            libgcc:
                                ----------
                                new:
                                    4.8.5-28.el7_5.1
                                old:
                                    4.8.5-28.el7
                            libgomp:
                                ----------
                                new:
                                    4.8.5-28.el7_5.1
                                old:
                                    4.8.5-28.el7
                            libmount:
                                ----------
                                new:
                                    2.23.2-52.el7_5.1
                                old:
                                    2.23.2-52.el7
                            libss:
                                ----------
                                new:
                                    1.42.9-12.el7_5
                                old:
                                    1.42.9-11.el7
                            libsss_idmap:
                                ----------
                                new:
                                    1.16.0-19.el7_5.8
                                old:
                                    1.16.0-19.el7
                            libsss_nss_idmap:
                                ----------
                                new:
                                    1.16.0-19.el7_5.8
                                old:
                                    1.16.0-19.el7
                            libstdc++:
                                ----------
                                new:
                                    4.8.5-28.el7_5.1
                                old:
                                    4.8.5-28.el7
                            libtomcrypt:
                                ----------
                                new:
                                    1.17-26.el7
                                old:
                            libtommath:
                                ----------
                                new:
                                    0.42.0-6.el7
                                old:
                            libuuid:
                                ----------
                                new:
                                    2.23.2-52.el7_5.1
                                old:
                                    2.23.2-52.el7
                            linux-firmware:
                                ----------
                                new:
                                    20180220-62.2.git6d51311.el7_5
                                old:
                                    20180220-62.git6d51311.el7
                            mariadb-libs:
                                ----------
                                new:
                                    1:5.5.60-1.el7_5
                                old:
                                    1:5.5.56-2.el7
                            microcode_ctl:
                                ----------
                                new:
                                    2:2.1-29.16.el7_5
                                old:
                                    2:2.1-29.el7
                            nspr:
                                ----------
                                new:
                                    4.19.0-1.el7_5
                                old:
                                    4.17.0-1.el7
                            nss:
                                ----------
                                new:
                                    3.36.0-7.el7_5
                                old:
                                    3.34.0-4.el7
                            nss-softokn:
                                ----------
                                new:
                                    3.36.0-5.el7_5
                                old:
                                    3.34.0-2.el7
                            nss-softokn-freebl:
                                ----------
                                new:
                                    3.36.0-5.el7_5
                                old:
                                    3.34.0-2.el7
                            nss-sysinit:
                                ----------
                                new:
                                    3.36.0-7.el7_5
                                old:
                                    3.34.0-4.el7
                            nss-tools:
                                ----------
                                new:
                                    3.36.0-7.el7_5
                                old:
                                    3.34.0-4.el7
                            nss-util:
                                ----------
                                new:
                                    3.36.0-1.el7_5
                                old:
                                    3.34.0-2.el7
                            open-vm-tools:
                                ----------
                                new:
                                    10.1.10-3.el7_5.1
                                old:
                                    10.1.10-3.el7
                            openldap:
                                ----------
                                new:
                                    2.4.44-15.el7_5
                                old:
                                    2.4.44-13.el7
                            procps-ng:
                                ----------
                                new:
                                    3.3.10-17.el7_5.2
                                old:
                                    3.3.10-17.el7
                            python:
                                ----------
                                new:
                                    2.7.5-69.el7_5
                                old:
                                    2.7.5-68.el7
                            python-crypto:
                                ----------
                                new:
                                old:
                                    2.6.1-2.el7
                            python-firewall:
                                ----------
                                new:
                                    0.4.4.4-15.el7_5
                                old:
                                    0.4.4.4-14.el7
                            python-libs:
                                ----------
                                new:
                                    2.7.5-69.el7_5
                                old:
                                    2.7.5-68.el7
                            python-perf:
                                ----------
                                new:
                                    3.10.0-862.14.4.el7
                                old:
                                    3.10.0-862.el7
                            python2-crypto:
                                ----------
                                new:
                                    2.6.1-15.el7
                                old:
                            rsyslog:
                                ----------
                                new:
                                    8.24.0-16.el7_5.4
                                old:
                                    8.24.0-16.el7
                            selinux-policy:
                                ----------
                                new:
                                    3.13.1-192.el7_5.6
                                old:
                                    3.13.1-192.el7
                            selinux-policy-targeted:
                                ----------
                                new:
                                    3.13.1-192.el7_5.6
                                old:
                                    3.13.1-192.el7
                            sos:
                                ----------
                                new:
                                    3.5-9.el7.centos
                                old:
                                    3.5-6.el7.centos
                            sssd-client:
                                ----------
                                new:
                                    1.16.0-19.el7_5.8
                                old:
                                    1.16.0-19.el7
                            sudo:
                                ----------
                                new:
                                    1.8.19p2-14.el7_5
                                old:
                                    1.8.19p2-13.el7
                            systemd:
                                ----------
                                new:
                                    219-57.el7_5.3
                                old:
                                    219-57.el7
                            systemd-libs:
                                ----------
                                new:
                                    219-57.el7_5.3
                                old:
                                    219-57.el7
                            systemd-python:
                                ----------
                                new:
                                    219-57.el7_5.3
                                old:
                                    219-57.el7
                            systemd-sysv:
                                ----------
                                new:
                                    219-57.el7_5.3
                                old:
                                    219-57.el7
                            systemtap-runtime:
                                ----------
                                new:
                                    3.2-8.el7_5
                                old:
                                    3.2-4.el7
                            tuned:
                                ----------
                                new:
                                    2.9.0-1.el7_5.2
                                old:
                                    2.9.0-1.el7
                            tzdata:
                                ----------
                                new:
                                    2018e-3.el7
                                old:
                                    2018c-1.el7
                            util-linux:
                                ----------
                                new:
                                    2.23.2-52.el7_5.1
                                old:
                                    2.23.2-52.el7
                            vdo:
                                ----------
                                new:
                                    6.1.0.168-18
                                old:
                                    6.1.0.149-16
                            yum-plugin-fastestmirror:
                                ----------
                                new:
                                    1.1.31-46.el7_5
                                old:
                                    1.1.31-45.el7
                            yum-utils:
                                ----------
                                new:
                                    1.1.31-46.el7_5
                                old:
                                    1.1.31-45.el7

              Summary for devs0242
              ------------
              Succeeded: 1 (changed=1)
              Failed:    0
              ------------
              Total states run:     1
              Total run time: 360.443 s
gtmanfred commented 6 years ago

yeah, it seems like one of those packages, possibly NetworkManager or iptables is causing the minion to be disconnected from the master publish bus, so that it is still able to connect and return, but has to wait for the minion reconnect event to happen on the publish bus for the second command to be sent through.

Unfortunately I am having a hard time replicating this issue, are you able to go through and figure out if it is one specific package that is causing this disconnect?

Thanks, Daniel

gtmanfred commented 6 years ago

I would bet money on NetworkManager restarting the network on an update, causing the disconnect of the minion to the masters publish bus, which means the minion will stop responding to the saltutil.is_running requests that the LocalClient uses to know that the minions are still running jobs, and eventually this leads to the "timeout" and Minion not responding message

Try skipping the NetworkManager upgrade, and see if you still see this problem, and instead upgrade NetworkManager separately.

mruepp commented 6 years ago

Can be but the problem is, that we use it in a primer orchestrator which runs on minion accept. We can not determine, what parts of the os will be upgraded. So this inconsistent behaviour is something, in my opinion, salt should address. So we used timeout in the state and its fine by now

gtmanfred commented 6 years ago

@saltstack/team-core can I get some opinions on this?

Thanks Daniel

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.