saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.17k stars 5.48k forks source link

Minion to Master communication blocked #66782

Closed dnessett closed 3 months ago

dnessett commented 3 months ago

Description I am experiencing the same problem that was reported in Issue 55103. This issue is closed since the salt versions involved are no longer supported. However, the problem still exists. This is a long standing problem that was first reported in 2019. Setup This problem requires no special setup and is observed with the following simple salt command:

dnessett@homelserv:~/Desktop$ sudo salt 'MOLS-T-0' test.ping
MOLS-T-0:
    True
dnessett@homelserv:~/Desktop$ sudo salt 'MOLS-T-0' saltutil.cmd '*' grains.items
MOLS-T-0:
    ERROR executing 'saltutil.cmd': The salt master could not be contacted. Is master running?
ERROR: Minions returned with non-zero exit code

In working through this problem, it was suggested it had something to do with the salt version of the Master and the Minion being different. However, the problem shown above occurs when both the Master and the Minion are at 3006.9

Furthermore, the problem occurs in other situations (again, with both the master and minion at 3006.9). For example, consider the following .sls file in /srv/salt:

hosts-restart:
  salt.function:
    - name: system.reboot
    - tgt: '*'

hosts-wait-for-reboot:
  salt.wait_for_event:
    - name: salt/minion/*/start
    - id_list: {{ [grains.id](http://grains.id/) }}
    - timeout: 600
    - require:
      - salt: hosts-restart

Here is the result of running the following state.apply command:

sudo salt 'MOLS-T-0' state.apply test-reboot
MOLS-T-0:
----------
          ID: hosts-restart
    Function: salt.function
        Name: system.reboot
      Result: False
     Comment: The salt master could not be contacted. Is master running?
     Started: 15:14:15.606157
    Duration: 110.519 ms
     Changes:   
----------
          ID: hosts-wait-for-reboot
    Function: salt.wait_for_event
        Name: salt/minion/*/start
      Result: False
     Comment: One or more requisite failed: test-reboot.hosts-restart
     Started: 15:14:15.717001
    Duration: 0.004 ms
     Changes:   

In both cases, the commands fail with an indication that the minion cannot contact the master.

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior

The two commands that reproduce the behavior are given in the description above.

Expected behavior

The minion should, in the first case, print grains.items and return it to the master. In the second case, the minion should reboot and the master should wait for it to come up and then report the successful reboot.

Screenshots

The problem is illustrated using commands executing in a terminal.

Versions Report dnessett@homelserv:~$ salt-master --version salt-master 3006.9 (Sulfur)

dnessett@MOLS-T-0:~$ salt-minion --version salt-minion 3006.9 (Sulfur)

I don't know what is required here.

Additional context Add any other context about the problem here.

welcome[bot] commented 3 months ago

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

dmurphy18 commented 3 months ago

@dnessett I am wondering what you are trying to achieve with this command sudo salt 'MOLS-T-0' saltutil.cmd '*' grains.items which is not making sense.

saltutil is typically a Salt Runner command run from the master, for example: saltutil.sync_all which is used with salt-run, https://docs.saltproject.io/en/3006/ref/runners/all/salt.runners.saltutil.html#salt.runners.saltutil.sync_all

I am unable to find any support for saltutil.cmd in the code, https://github.com/saltstack/salt/blob/3006.x/salt/runners/saltutil.py

Lastly, if you want to see the grains for all salt-minions on connected to a salt-master salt '*' grains.items should work for you.

I suggest reading the following books on leveraging Salt, https://www.amazon.com/Learning-SaltStack-Second-Colton-Myers/dp/1785881906/ref=pd_lpo_sccl_2/140-8082010-4608236?pd_rd_w=7YIxN&content-id=amzn1.sym.4c8c52db-06f8-4e42-8e56-912796f2ea6c&pf_rd_p=4c8c52db-06f8-4e42-8e56-912796f2ea6c&pf_rd_r=G4AP1GE6TEBTKHG8NHHQ&pd_rd_wg=0IIhR&pd_rd_r=405c5807-b51a-48c2-a5e4-a89f46ce05a5&pd_rd_i=1785881906&psc=1 and https://www.amazon.com/Mastering-SaltStack-Second-Joseph-Hall/dp/1786467399

Both of the books are written by former Salt Core Team members. I suggest closing this issue, since it appears to be incorrect usage of Salt.

dnessett commented 3 months ago

@dmurphy18 : The command

sudo salt 'MOLS-T-0' saltutil.cmd '*' grains.items

just illustrates the problem. It comes from the first report of issue 55103 in 2019. I used it because it is less complicated than what I was trying to do when I ran up against this problem, which was (in test-reboot.sls in /srv/salt):

hosts-restart:
  salt.function:
    - name: system.reboot
    - tgt: '*'

hosts-wait-for-reboot:
  salt.wait_for_event:
    - name: salt/minion/*/start
    - id_list: {{ [grains.id](http://grains.id/) }}
    - timeout: 600
    - require:
      - salt: hosts-restart

and then:

sudo salt 'MOLS-T-0' state.apply test-reboot
MOLS-T-0:
----------
          ID: hosts-restart
    Function: salt.function
        Name: system.reboot
      Result: False
     Comment: The salt master could not be contacted. Is master running?
     Started: 15:14:15.606157
    Duration: 110.519 ms
     Changes:   
----------
          ID: hosts-wait-for-reboot
    Function: salt.wait_for_event
        Name: salt/minion/*/start
      Result: False
     Comment: One or more requisite failed: test-reboot.hosts-restart
     Started: 15:14:15.717001
    Duration: 0.004 ms
     Changes:   

I want to reboot the minion after executing some state modifications in an .sls file and then wait for the reboot to complete. I used test-reboot to see if I could reboot the minion in a state file and wait for it to complete. It itself is a modification of the .sls code found here. To be perfectly clear, test-reboot.sls is simply a way to see if this is possible. If it is, I would then incorporate the salt code given in test-reboot.sls into the ultimate target .sls file.

Closing this issue is not, in my view, advisable, since this problem has existed for 5 years without being fixed. If there is another way to reboot a minion from a state file that works, I would by happy to use it, but I could not find such a solution.

dmurphy18 commented 3 months ago

@dnessett So a bad command from an issue in 2019 is still a bad command (even though it has been found to work, there are better ways to get grains.items), and that was using Python 2.7, lot of water under the bridge. From previous communications, you were asking a lot of noobie questions, which is why I suggest reading the two books.

The issue looks very like you are asking for help in how to use Salt to achieve certain ends, rather than an actual bug in functionality, hence the previous suggestions to ask questions on Discord Community Salt forums.

I retried the command as in https://github.com/saltstack/salt/issues/55103#issuecomment-548544849, and it was successful.

[root@dhcp-10-47-15-216 david]# salt-call saltutil.cmd '*' grains.items
local:
    ----------
    tr9:
        ----------
        jid:
            20240806194140198825
        out:
            nested
        ret:
            ----------
            biosreleasedate:
                12/01/2006
            biosvendor:
                innotek GmbH
            biosversion:
                VirtualBox
            boardname:
                VirtualBox
            cpu_flags:
                - fpu
                - vme
                - de
                - pse
                - tsc
                - msr
                - pae
                - mce
                - cx8
                - apic
                - sep
                - mtrr
                - pge
                - mca
                - cmov
                - pat
                - pse36
                - clflush
                - mmx
                - fxsr
                - sse
                - sse2
                - ht
                - syscall
                - nx
                - rdtscp
                - lm
                - constant_tsc
                - rep_good
                - nopl
                - xtopology
                - nonstop_tsc
                - cpuid
                - tsc_known_freq
                - pni
                - pclmulqdq
                - ssse3
                - cx16
                - pcid
                - sse4_1
                - sse4_2
                - x2apic
                - movbe
                - popcnt
                - aes
                - xsave
                - avx
                - rdrand
                - hypervisor
                - lahf_lm
                - abm
                - 3dnowprefetch
                - pti
                - fsgsbase
                - bmi1
                - avx2
                - bmi2
                - invpcid
                - rdseed
                - clflushopt
                - arat
                - md_clear
                - flush_l1d
                - arch_capabilities
            cpu_model:
                Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
            cpuarch:
                x86_64
            cwd:
                /
            disks:
                - sr0
                - sda
            dns:
                ----------
                domain:
                ip4_nameservers:
                    - 192.19.189.10
                    - 192.19.189.30
                ip6_nameservers:
                nameservers:
                    - 192.19.189.10
                    - 192.19.189.30
                options:
                search:
                    - dhcp.broadcom.net
                sortlist:
            domain:
                dhcp.broadcom.net
            efi:
                False
            efi-secure-boot:
                False
            fqdn:
                dhcp-10-47-15-216.dhcp.broadcom.net
            fqdn_ip4:
                - 10.47.15.216
            fqdn_ip6:
            fqdns:
            gid:
                0
            gpus:
                |_
                  ----------
                  model:
                      SVGA II Adapter
                  vendor:
                      vmware
            groupname:
                root
            host:
                dhcp-10-47-15-216
            hwaddr_interfaces:
                ----------
                enp0s3:
                    08:00:27:6e:78:1b
                lo:
                    00:00:00:00:00:00
            id:
                tr9
            init:
                systemd
            ip4_gw:
                10.47.12.1
            ip4_interfaces:
                ----------
                enp0s3:
                    - 10.47.15.216
                lo:
                    - 127.0.0.1
            ip6_gw:
                False
            ip6_interfaces:
                ----------
                enp0s3:
                lo:
                    - ::1
            ip_gw:
                True
            ip_interfaces:
                ----------
                enp0s3:
                    - 10.47.15.216
                lo:
                    - 127.0.0.1
                    - ::1
            ipv4:
                - 10.47.15.216
                - 127.0.0.1
            ipv6:
                - ::1
            kernel:
                Linux
            kernelparams:
                |_
                  - BOOT_IMAGE
                  - (hd0,msdos1)/vmlinuz-5.14.0-427.28.1.el9_4.x86_64
                |_
                  - root
                  - /dev/mapper/rl-root
                |_
                  - ro
                  - None
                |_
                  - resume
                  - /dev/mapper/rl-swap
                |_
                  - rd.lvm.lv
                  - rl/root
                |_
                  - rd.lvm.lv
                  - rl/swap
                |_
                  - rhgb
                  - None
                |_
                  - quiet
                  - None
                |_
                  - crashkernel
                  - 1G-4G:192M,4G-64G:256M,64G-:512M
            kernelrelease:
                5.14.0-427.28.1.el9_4.x86_64
            kernelversion:
                #1 SMP PREEMPT_DYNAMIC Wed Jul 31 15:28:35 UTC 2024
            locale_info:
                ----------
                defaultencoding:
                    UTF-8
                defaultlanguage:
                    en_US
                detectedencoding:
                    utf-8
                timezone:
                    MDT
            localhost:
                dhcp-10-47-15-216
            lsb_distrib_codename:
                Blue Onyx
            lsb_distrib_id:
                Rocky Linux
            lsb_distrib_release:
                9.4
            lvm:
                ----------
                rl:
                    - home
                    - root
                    - swap
            machine_id:
                8d61305eea694a83a72f035727537642
            manufacturer:
                innotek GmbH
            master:
                localhost
            mdadm:
            mem_total:
                3659
            nodename:
                dhcp-10-47-15-216
            num_cpus:
                2
            num_gpus:
                1
            os:
                Rocky
            os_family:
                RedHat
            osarch:
                x86_64
            oscodename:
                Blue Onyx
            osfinger:
                Rocky Linux-9
            osfullname:
                Rocky Linux
            osmajorrelease:
                9
            osrelease:
                9.4
            osrelease_info:
                - 9
                - 4
            path:
                /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
            pid:
                3578
            productname:
                VirtualBox
            ps:
                ps -efHww
            pythonexecutable:
                /opt/saltstack/salt/bin/python3.10
            pythonpath:
                - /opt/saltstack/salt
                - /opt/saltstack/salt/extras-3.10
                - /opt/saltstack/salt/lib/python310.zip
                - /opt/saltstack/salt/lib/python3.10
                - /opt/saltstack/salt/lib/python3.10/lib-dynload
                - /opt/saltstack/salt/lib/python3.10/site-packages
            pythonversion:
                - 3
                - 10
                - 14
                - final
                - 0
            saltpath:
                /opt/saltstack/salt/lib/python3.10/site-packages/salt
            saltversion:
                3006.9
            saltversioninfo:
                - 3006
                - 9
            selinux:
                ----------
                enabled:
                    True
                enforced:
                    Enforcing
            serialnumber:
                0
            server_id:
                185603172
            shell:
                /bin/sh
            ssds:
            swap_total:
                4043
            systemd:
                ----------
                features:
                    +PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
                version:
                    252
            systempath:
                - /usr/local/sbin
                - /usr/local/bin
                - /usr/sbin
                - /usr/bin
            transactional:
                False
            uid:
                0
            username:
                root
            uuid:
                059e7a66-d614-864c-aea3-a280ccbc8574
            virtual:
                VirtualBox
            zfs_feature_flags:
                False
            zfs_support:
                False
            zmqversion:
                4.3.4
        retcode:
            0
[root@dhcp-10-47-15-216 david]# 

This is with salt-master and salt-minion on Rocky Linux 9 (VirtualBox VM that I had handy).

For the versions report, for a salt-master salt-run --versions-report, for a salt-minion salt-call --local test.versions are the commands to run, this shows a lot of information about the system and what OS you are running on which can be very helpful in determining a resolution and is required before any investigation is made.

Lastly the commands you used did not product errors

[root@dhcp-10-47-15-216 david]# salt tr9 test.ping
tr9:
    True
[root@dhcp-10-47-15-216 david]# salt 'tr9' saltutil.cmd '*' grains.items
tr9:
    ----------
    tr9:
        ----------
        jid:
            20240806194749964754
        out:
            nested
        ret:
            ----------
            biosreleasedate:
                12/01/2006
            biosvendor:
                innotek GmbH
            biosversion:
                VirtualBox
            boardname:
                VirtualBox
            cpu_flags:
                - fpu
                - vme
                - de
                - pse
                - tsc
                - msr
                - pae
                - mce
                - cx8
                - apic
                - sep
                - mtrr
                - pge
                - mca
                - cmov
                - pat
                - pse36
                - clflush
                - mmx
                - fxsr
                - sse
                - sse2
                - ht
                - syscall
                - nx
                - rdtscp
                - lm
                - constant_tsc
                - rep_good
                - nopl
                - xtopology
                - nonstop_tsc
                - cpuid
                - tsc_known_freq
                - pni
                - pclmulqdq
                - ssse3
                - cx16
                - pcid
                - sse4_1
                - sse4_2
                - x2apic
                - movbe
                - popcnt
                - aes
                - xsave
                - avx
                - rdrand
                - hypervisor
                - lahf_lm
                - abm
                - 3dnowprefetch
                - pti
                - fsgsbase
                - bmi1
                - avx2
                - bmi2
                - invpcid
                - rdseed
                - clflushopt
                - arat
                - md_clear
                - flush_l1d
                - arch_capabilities
            cpu_model:
                Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
            cpuarch:
                x86_64
            cwd:
                /
            disks:
                - sr0
                - sda
            dns:
                ----------
                domain:
                ip4_nameservers:
                    - 192.19.189.10
                    - 192.19.189.30
                ip6_nameservers:
                nameservers:
                    - 192.19.189.10
                    - 192.19.189.30
                options:
                search:
                    - dhcp.broadcom.net
                sortlist:
            domain:
                dhcp.broadcom.net
            efi:
                False
            efi-secure-boot:
                False
            fqdn:
                dhcp-10-47-15-216.dhcp.broadcom.net
            fqdn_ip4:
                - 10.47.15.216
            fqdn_ip6:
            fqdns:
            gid:
                0
            gpus:
                |_
                  ----------
                  model:
                      SVGA II Adapter
                  vendor:
                      vmware
            groupname:
                root
            host:
                dhcp-10-47-15-216
            hwaddr_interfaces:
                ----------
                enp0s3:
                    08:00:27:6e:78:1b
                lo:
                    00:00:00:00:00:00
            id:
                tr9
            init:
                systemd
            ip4_gw:
                10.47.12.1
            ip4_interfaces:
                ----------
                enp0s3:
                    - 10.47.15.216
                lo:
                    - 127.0.0.1
            ip6_gw:
                False
            ip6_interfaces:
                ----------
                enp0s3:
                lo:
                    - ::1
            ip_gw:
                True
            ip_interfaces:
                ----------
                enp0s3:
                    - 10.47.15.216
                lo:
                    - 127.0.0.1
                    - ::1
            ipv4:
                - 10.47.15.216
                - 127.0.0.1
            ipv6:
                - ::1
            kernel:
                Linux
            kernelparams:
                |_
                  - BOOT_IMAGE
                  - (hd0,msdos1)/vmlinuz-5.14.0-427.28.1.el9_4.x86_64
                |_
                  - root
                  - /dev/mapper/rl-root
                |_
                  - ro
                  - None
                |_
                  - resume
                  - /dev/mapper/rl-swap
                |_
                  - rd.lvm.lv
                  - rl/root
                |_
                  - rd.lvm.lv
                  - rl/swap
                |_
                  - rhgb
                  - None
                |_
                  - quiet
                  - None
                |_
                  - crashkernel
                  - 1G-4G:192M,4G-64G:256M,64G-:512M
            kernelrelease:
                5.14.0-427.28.1.el9_4.x86_64
            kernelversion:
                #1 SMP PREEMPT_DYNAMIC Wed Jul 31 15:28:35 UTC 2024
            locale_info:
                ----------
                defaultencoding:
                    UTF-8
                defaultlanguage:
                    en_US
                detectedencoding:
                    utf-8
                timezone:
                    MDT
            localhost:
                dhcp-10-47-15-216
            lsb_distrib_codename:
                Blue Onyx
            lsb_distrib_id:
                Rocky Linux
            lsb_distrib_release:
                9.4
            lvm:
                ----------
                rl:
                    - home
                    - root
                    - swap
            machine_id:
                8d61305eea694a83a72f035727537642
            manufacturer:
                innotek GmbH
            master:
                localhost
            mdadm:
            mem_total:
                3659
            nodename:
                dhcp-10-47-15-216
            num_cpus:
                2
            num_gpus:
                1
            os:
                Rocky
            os_family:
                RedHat
            osarch:
                x86_64
            oscodename:
                Blue Onyx
            osfinger:
                Rocky Linux-9
            osfullname:
                Rocky Linux
            osmajorrelease:
                9
            osrelease:
                9.4
            osrelease_info:
                - 9
                - 4
            path:
                /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
            pid:
                3578
            productname:
                VirtualBox
            ps:
                ps -efHww
            pythonexecutable:
                /opt/saltstack/salt/bin/python3.10
            pythonpath:
                - /opt/saltstack/salt
                - /opt/saltstack/salt/extras-3.10
                - /opt/saltstack/salt/lib/python310.zip
                - /opt/saltstack/salt/lib/python3.10
                - /opt/saltstack/salt/lib/python3.10/lib-dynload
                - /opt/saltstack/salt/lib/python3.10/site-packages
            pythonversion:
                - 3
                - 10
                - 14
                - final
                - 0
            saltpath:
                /opt/saltstack/salt/lib/python3.10/site-packages/salt
            saltversion:
                3006.9
            saltversioninfo:
                - 3006
                - 9
            selinux:
                ----------
                enabled:
                    True
                enforced:
                    Enforcing
            serialnumber:
                0
            server_id:
                185603172
            shell:
                /bin/sh
            ssds:
            swap_total:
                4043
            systemd:
                ----------
                features:
                    +PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
                version:
                    252
            systempath:
                - /usr/local/sbin
                - /usr/local/bin
                - /usr/sbin
                - /usr/bin
            transactional:
                False
            uid:
                0
            username:
                root
            uuid:
                059e7a66-d614-864c-aea3-a280ccbc8574
            virtual:
                VirtualBox
            zfs_feature_flags:
                False
            zfs_support:
                False
            zmqversion:
                4.3.4
        retcode:
            0
[root@dhcp-10-47-15-216 david]# 
dnessett commented 3 months ago

@dmurphy18 : Let me address the first issue. I am not a SALT noobe, but neither am I SALT expert. I am a user of SALT managing a set of 30 laptops used by a K-8 school. I have been using SALT for about a year and a half, but I only use it sparsely when some task arises that I need to complete. Consequently, I frequently forget what I learned previously (since it may have been 6-12 months ago). I am certainly not immersed in SALT programming on a day-by-day basis.

Here is the version information you requested:

On the Master:

dnessett@homelserv:~$ salt-run --versions-report
Salt Version:
          Salt: 3006.9

Python Version:
        Python: 3.10.14 (main, Jun 26 2024, 11:44:37) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.4
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.17.0
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: linuxmint 21.3 virginia
        locale: utf-8
       machine: x86_64
       release: 6.5.0-44-generic
        system: Linux
       version: Linux Mint 21.3 virginia

On the minion:

[sudo] password for dnessett:             
local:
    Salt Version:
              Salt: 3006.9

    Python Version:
            Python: 3.10.14 (main, Jun 26 2024, 11:44:37) [GCC 11.2.0]

    Dependency Versions:
              cffi: 1.14.6
          cherrypy: 18.6.1
      cryptography: 42.0.5
          dateutil: 2.8.1
         docker-py: Not Installed
             gitdb: Not Installed
         gitpython: Not Installed
            Jinja2: 3.1.4
           libgit2: Not Installed
      looseversion: 1.0.2
          M2Crypto: Not Installed
              Mako: Not Installed
           msgpack: 1.0.2
      msgpack-pure: Not Installed
      mysql-python: Not Installed
         packaging: 22.0
         pycparser: 2.21
          pycrypto: Not Installed
      pycryptodome: 3.19.1
            pygit2: Not Installed
      python-gnupg: 0.4.8
            PyYAML: 6.0.1
             PyZMQ: 23.2.0
            relenv: 0.17.0
             smmap: Not Installed
           timelib: 0.2.4
           Tornado: 4.5.3
               ZMQ: 4.3.4

    System Versions:
              dist: linuxmint 21.1 vera
            locale: utf-8
           machine: x86_64
           release: 5.15.0-116-generic
            system: Linux
           version: Linux Mint 21.1 vera

That you ran your tests on a VM may suggest the problem I am experiencing is related to some problem in the communications layer of SALT. I am running two separate machines connected over a local area network using wifi. I am only suggesting this as a possible cause. I am not a SALT developer and have no knowledge how the SALT communications layer is implemented in a VM environment.

It is interesting that

salt-call saltutil.cmd '*' grains.items

runs in a VM environment, but not in a non-virtual environment (as shown by the first message I posted). Is this a hint that might suggest where the problem lies?

Added later: I later realized that we used two different SALT commands. You used the command specified above, whereas I used

sudo salt 'MOLS-T-0' saltutil.cmd '*' grains.items

So, it is not shown that the success of the command you used and the failure of the command I used are related.

dmurphy18 commented 3 months ago

@dnessett Running over a localhost VM, verses a real-network should not make any difference unless there was some network connectivity issue, such as, a firewall stopping communication, that is all transport layer.

However I did run into an issue using a salt-minion on another container instance, but this is more related to using saltutil which is really for use for salt-masters and runners, see the following

root [ / ]# salt-call saltutil.cmd '*' grains.items
[DEBUG   ] Minion of '10.47.15.216' is handling event tag '__master_connected'
[ERROR   ] Unable to connect to the salt master publisher at /var/run/salt/master
[ERROR   ] An un-handled exception was caught by Salt's global exception handler:
SaltClientError: The salt master could not be contacted. Is master running?
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/__init__.py", line 386, in run_job
    pub_data = self.pub(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/__init__.py", line 1886, in pub
    raise SaltClientError
salt.exceptions.SaltClientError

The problem here is that there is not a file /var/run/salt/master on the container instance with the salt-minion, and there should not be since no salt-master is installed there. This is an outcome of using a salt-master / runner command that happens to work if salt-minion and salt-master are on the same system, but is not the correct way to obtain grains.items.

As stated previously incorrect usage of Salt CLI, that happens to work in some instances. And just to confirm that there is no connectivity problem with the salt-master

root [ / ]# salt-call test.version 
[DEBUG   ] Minion of '10.47.15.216' is handling event tag '__master_connected'
local:
    3006.9
root [ / ]# 

by not using --local the salt-call command will inform the salt-master of the command, and fail if it cannot connect with the salt-master

dnessett commented 3 months ago

@dmurphy18 : If I understand you correctly, the failure of

sudo salt 'MOLS-T-0' saltutil.cmd '*' grains.items

(using my version of the call), is related to the misuse of saltutil.cmd, which is intended for masters/runners, not masters/minions (I may have this wrong, since I am not familiar with runners and how they relate to minions - Again, I am not a SALT expert, only an unsophisticated SALT user).

So, that brings the issue back to what I was originally interested in the first place, rebooting a minion using state.apply. Reiterating the test I wrote, in /srv/salt is the file test-reboot.sls. The content of this file is:

hosts-restart:
  salt.function:
    - name: system.reboot
    - tgt: '*'

hosts-wait-for-reboot:
  salt.wait_for_event:
    - name: salt/minion/*/start
    - id_list: {{ [grains.id](http://grains.id/) }}
    - timeout: 600
    - require:
      - salt: hosts-restart

When I execute the following state change:

sudo salt 'MOLS-T-0' state.apply test-reboot

I get the following error:

MOLS-T-0:
----------
          ID: hosts-restart
    Function: salt.function
        Name: system.reboot
      Result: False
     Comment: The salt master could not be contacted. Is master running?
     Started: 15:14:15.606157
    Duration: 110.519 ms
     Changes:   
----------
          ID: hosts-wait-for-reboot
    Function: salt.wait_for_event
        Name: salt/minion/*/start
      Result: False
     Comment: One or more requisite failed: test-reboot.hosts-restart
     Started: 15:14:15.717001
    Duration: 0.004 ms
     Changes:   

Both commands produce the same Comment in the error report, that is, "The salt master could not be contacted. Is master running?"

While I do not know enough about the internals of SALT to say with any certainty, it sure looks like these two problems are related. But, I am open to being disabused of this view.

whytewolf commented 3 months ago

@dmurphy18 saltutil is both a runner AND a module. https://docs.saltproject.io/en/latest/ref/modules/all/salt.modules.saltutil.html#salt.modules.saltutil.cmd_iter saltutil.cmd is the module version.

however saltutil.cmd requires the minion being targeted is running on a master. if it isn't ... then the error makes sense.

@dnessett the state is also doing the exact same thing. saltmod states are meant for orchestration not state.apply. they treat the local minion as if it is located on a master.

whytewolf commented 3 months ago

i just went and looked at the original ticket ... and it was the exact same issue. someone trying to run saltutil.cmd on a minion that isn't on the master. when the documentation actually says it needs to be running on the master.

dmurphy18 commented 3 months ago

thanks for the clarification @whytewolf , as always, encyclopedic knowledge of Salt.

dnessett commented 3 months ago

@whytewolf Thanks for the clarification. I am unfamiliar with orchestration. Could you point me to the documentation for this feature?

Added later: When I first set up the laptop farm I manage, I interacted with you (@whytewolf) and others (e.g., @OrangeDog) in a chat room to get help. Does that still exist? I tried to find the Discord Community SALT forums, but without success. When I first accessed the Discourse Community website, I was directed to create a server, which when established looked like some sort of Facebook like interface with "friends" and cartoon characters that seemed more appropriate for children than adults discussing SALT issues. If the chat room no longer exists, can you point me to the right Discord Community SALT forum URL so I can ask questions about orchestration? Thanks.

dnessett commented 3 months ago

@OrangeDog : OK, thanks.

dmurphy18 commented 3 months ago

@OrangeDog @dnessett Salt uses Discord, Slack got retired some time ago

dnessett commented 3 months ago

@dmurphy18 : Well, I guess that brings me back to the original question. I am not familiar with Discord and find it somewhat weird. Can you supply me with a URL I can use to get access to the SALT discussion forum(s)?

dmurphy18 commented 3 months ago

@dnessett Start here in the general channel https://discord.com/channels/1200072194781368340/1219742614257930381

Please consider closing this issue, given the information provided.

dnessett commented 3 months ago

@dmurphy18 I will close the issue, but first would like to figure out how to access the Salt forums on discord. When I click on the link you provided, I get the following message:

NO TEXT CHANNELS You find yourself in a strange place. You don't have access to any text channels, or there are none in this server.

dmurphy18 commented 3 months ago

@dnessett try these links http://discord.gg/YVQamSwV3g?trk=public_post-text and https://www.linkedin.com/posts/saltproject_join-the-salt-project-community-discord-server-activity-7183183520352153600-ejml

dnessett commented 3 months ago

@dmurphy18 Thanks the first one worked, but I have not used my LinkedIn account for years and the user_name and password no longer work. Is the LinkedIn account used for a different set of fora?

dmurphy18 commented 3 months ago

@dnessett google search pulled it up, as in 'let me google that for you' :rofl:

dnessett commented 3 months ago

As specific by @whytewolf, this issue is not a bug, but rather results from not using orchestration when required.

dnessett commented 2 months ago

Just in case there are those who view this issue and wonder what is the solution to the problem of rebooting minion machines, here are two solutions (crafted by @whytewolf). If you wish to boot all minions for which the master has keys, put the following .sls file in /srv/salt/orch with the name reboot.sls:

{% set minions = salt["saltutil.runner"]("cache.grains", ['*']).keys()|list %}

hosts-restart:
  salt.function:
    - name: cmd.run_bg
    - tgt: {{ minions | join(",")}}
    - tgt_type: list
    - arg: 
        - "salt-call --local system.reboot 1"

hosts-wait-for-reboot:
  salt.wait_for_event:
    - name: salt/minion/*/start
    - id_list: {{ minions | json }}
    - timeout: 600
    - require:
      - salt: hosts-restart

Then execute:

salt-run state.orchestrate orch.reboot

Alternatively, if you wish to boot only those minion machines that are currently listening to the master (i.e., not those machines that are turned off. This is probably the right choice, since otherwise you will get errors for those machines that are not powered on), put the following .sls file in /srv/salt/orch with the name reboot-alive.sls:

{% set minions = salt["saltutil.runner"]("manage.alived") %}

hosts-restart:
  salt.function:
    - name: cmd.run_bg
    - tgt: {{ minions | join(",")}}
    - tgt_type: list
    - arg: 
        - "salt-call --local system.reboot 1"

hosts-wait-for-reboot:
  salt.wait_for_event:
    - name: salt/minion/*/start
    - id_list: {{ minions | json }}
    - timeout: 600
    - require:
      - salt: hosts-restart

Then execute

salt-run state.orchestrate orch.reboot-alive