saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14k stars 5.47k forks source link

[BUG] Error while bringing up minion for multi-master. Minion unable to successfully connect to a Salt Master. #66438

Open alrf opened 2 months ago

alrf commented 2 months ago

Description Minion can't connect to Master, both are 3006.7:

Error while bringing up minion for multi-master. Is master at serverXXX.example.com responding? The error message was Unable to sign_in to master: Attempt to authenticate with the salt master failed with timeout error

Setup Master is onedir installation (Debian). Minion is regular installation, based on Fedora CoreOS (FCOS) - not sure if onedir can be used there.

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior Logs:

2024-04-25 15:20:39,673 [salt.cli.daemons :284 ][INFO    ][3591] Starting up the Salt Minion
2024-04-25 15:20:39,674 [salt.utils.event :284 ][INFO    ][3591] Starting pull socket on /var/run/salt/minion/minion_event_b375127e98_pull.ipc
2024-04-25 15:20:39,928 [salt.minion      :284 ][INFO    ][3591] Creating minion process manager
2024-04-25 15:21:15,079 [salt.minion      :284 ][ERROR   ][3591] Error while bringing up minion for multi-master. Is master at serverXXX.example.com responding? The error message was Unable to sign_in to master: Attempt to authenticate with the salt master failed with timeout error
2024-04-25 15:21:39,934 [salt.minion      :284 ][ERROR   ][3591] Minion unable to successfully connect to a Salt Master.

Not a firewall/network issue, salt-master ports are available from minion:

telnet serverXXX.example.com 4505
Trying XX.XX.XX.XX...
Connected to serverXXX.example.com.
Escape character is '^]'.
quit

telnet serverXXX.example.com 4506
Trying XX.XX.XX.XX...
Connected to serverXXX.example.com.
Escape character is '^]'.
quit

Expected behavior Minion should be able to connect to Master.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml Master: salt --versions-report Salt Version: Salt: 3006.7 Python Version: Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0] Dependency Versions: cffi: 1.14.6 cherrypy: unknown dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.4.8 PyYAML: 6.0.1 PyZMQ: 23.2.0 relenv: 0.15.1 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: debian 11 bullseye locale: utf-8 machine: x86_64 release: 5.10.0-26-amd64 system: Linux version: Debian GNU/Linux 11 bullseye Minion: salt-call --versions-report /usr/lib/python3.12/site-packages/salt/ext/tornado/util.py:246: SyntaxWarning: invalid escape sequence '\d' """Unescape a string escaped by `re.escape`. Salt Version: Salt: 3006.7 Python Version: Python: 3.12.2 (main, Feb 21 2024, 00:00:00) [GCC 13.2.1 20231205 (Red Hat 13.2.1-6)] Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.5 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: Not Installed pycrypto: Not Installed pycryptodome: 3.20.0 pygit2: Not Installed python-gnupg: Not Installed PyYAML: 6.0.1 PyZMQ: 25.1.0 relenv: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 6.3.3 ZMQ: 4.3.4 System Versions: dist: fedora 39.20240407.3.0 locale: utf-8 machine: x86_64 release: 6.8.4-200.fc39.x86_64 system: Linux version: Fedora Linux 39.20240407.3.0 ```
alrf commented 2 months ago

It seems that python versions must match. I was able to get onedir installation on FCOS and Minion connected to Master.

sasidharjetb commented 2 months ago

i got the same issue but how can i use one dir for bootstrap i am currently using this for installation i am using ubuntu 22

curl -o bootstrap-salt.sh -L https://bootstrap.saltproject.io ;

[ERROR ][3301590] Error while bringing up minion for multi-master. Is master at salt01 responding? 2024-05-03 08:35:50,727 [salt.minion :819 ][DEBUG ][3301590] Connecting to master. Attempt 1 of 1 2024-05-03 08:35:50,727 [salt.utils.network:2314][DEBUG ][3301590] "salt01" Not an IP address? Assuming it is a hostname. 2024-05-03 08:35:50,736 [salt.minion :256 ][DEBUG ][3301590] Master URI: tcp://10.16.1.6:4506 2024-05-03 08:35:50,737 [salt.crypt :514 ][DEBUG ][3301590] Re-using AsyncAuth for ('/etc/salt/pki/minion', 'aksdevminiongcp01', 'tcp://10.16.1.6:4506') 2024-05-03 08:35:50,758 [salt.transport.zeromq:158 ][DEBUG ][3301590] Generated random reconnect delay between '1000ms' and '11000ms' (10627) 2024-05-03 08:35:50,758 [salt.transport.zeromq:165 ][DEBUG ][3301590] Setting zmq_reconnect_ivl to '10627ms' 2024-05-03 08:35:50,759 [salt.transport.zeromq:169 ][DEBUG ][3301590] Setting zmq_reconnect_ivl_max to '11000ms' 2024-05-03 08:35:50,759 [salt.crypt :208 ][DEBUG ][3301590] salt.crypt.get_rsa_key: Loading private key 2024-05-03 08:35:50,759 [salt.crypt :900 ][DEBUG ][3301590] Loaded minion key: /etc/salt/pki/minion/minion.pem 2024-05-03 08:35:50,770 [salt.utils.event :315 ][DEBUG ][3301590] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_ccc4af074d_pub.ipc 2024-05-03 08:35:50,770 [salt.utils.event :316 ][DEBUG ][3301590] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_ccc4af074d_pull.ipc 2024-05-03 08:35:50,770 [salt.transport.zeromq:212 ][DEBUG ][3301590] Connecting the Minion to the Master publish port, using the URI: tcp://10.16.1.6:4505 2024-05-03 08:35:50,771 [salt.transport.zeromq:216 ][DEBUG ][3301590] <salt.transport.zeromq.PublishClient object at 0x72cd64195c00> connecting to tcp://10.16.1.6:4505 2024-05-03 08:35:50,773 [salt.utils.event :823 ][DEBUG ][3301590] Sending event: tag = __master_connected; data = {'master': 'salt01', '_stamp': '2024-05-03T08:35:50.773481'} 2024-05-03 08:35:50,774 [salt.crypt :208 ][DEBUG ][3301590] salt.crypt.get_rsa_key: Loading private key 2024-05-03 08:35:50,774 [salt.crypt :900 ][DEBUG ][3301590] Loaded minion key: /etc/salt/pki/minion/minion.pem 2024-05-03 08:35:50,786 [salt.transport.ipc:372 ][DEBUG ][3301590] Closing IPCMessageClient instance

alrf commented 2 months ago

I found another issue on FCOS: SELinux. While the Enforcing policy is set, salt-minion can't connect to a salt-master.

However, the documentation is extremely old (it contains examples for CentOS/RHEL 5 and 6): https://docs.saltproject.io/en/latest/topics/troubleshooting/index.html#salt-and-selinux and useless in case of FCOS

# chcon system_u:object_r:rpm_exec_t:s0 /usr/bin/salt-minion
chcon: failed to change context of '/usr/bin/salt-minion' to 'system_u:object_r:rpm_exec_t:s0': Read-only file system
# chcon system_u:object_r:rpm_exec_t:s0 /usr/bin/salt-call
chcon: failed to change context of '/usr/bin/salt-call' to 'system_u:object_r:rpm_exec_t:s0': Read-only file system

due to immutable / and read only /usr in FCOS: https://docs.fedoraproject.org/en-US/fedora-coreos/storage/#_immutable_read_only_usr

alrf commented 2 months ago

SELinux denies these actions (bunch of them in the output):

# ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -ts today
time->Tue May  7 16:00:17 2024
type=AVC msg=audit(1715097617.514:1641): avc:  denied  { name_connect } for  pid=5396 comm="/usr/lib/opt/sa" dest=4506 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:salt_port_t:s0 tclass=tcp_socket permissive=0

However:

# semanage port -l | grep salt
salt_port_t                    tcp      4505, 4506
alrf commented 2 months ago

All described issues with Minion/Master connections and SELinux are on FCOS 39.20231101.3.0 and 39.20240210.3.0 versions. The latest FCOS version 39.20240407.3.0 (as of today) doesn't have such problems, everything works out of the box.

But it can't be used in my case as OKD4 cluster (even latest version) is tied to a specific FCOS version (not the latest one).

alrf commented 2 months ago

SELinux denies these actions (bunch of them in the output):

# ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -ts today
time->Tue May  7 16:00:17 2024
type=AVC msg=audit(1715097617.514:1641): avc:  denied  { name_connect } for  pid=5396 comm="/usr/lib/opt/sa" dest=4506 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:salt_port_t:s0 tclass=tcp_socket permissive=0

However:

# semanage port -l | grep salt
salt_port_t                    tcp      4505, 4506

I managed to solve this by: 1) # rpm-ostree install setroubleshoot - install required tools in FCOS 2) # ausearch -m AVC | audit2allow -m salt_fix > salt_fix.te - generate an allow policy based on audit.log 3) # more salt_fix.te - check the policy generated by audit2allow. In my case it was:

module salt_fix 1.0;

require {
    type getty_t;
    type etc_t;
    type sudo_exec_t;
    type dmidecode_exec_t;
    type var_t;
    type systemd_hwdb_t;
    type kernel_t;
    type init_t;
    type systemd_notify_t;
    type ssh_exec_t;
    type salt_port_t;
    type http_port_t;
    class capability dac_override;
    class capability2 checkpoint_restore;
    class unix_dgram_socket sendto;
    class file { append create execute execute_no_trans ioctl map open read rename unlink write };
    class tcp_socket name_connect;
}

#============= getty_t ==============
allow getty_t self:capability2 checkpoint_restore;

#============= init_t ==============
allow init_t dmidecode_exec_t:file { execute execute_no_trans open read };

#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow init_t dmidecode_exec_t:file map;
allow init_t etc_t:file write;

#!!!! This avc can be allowed using the boolean 'nis_enabled'
allow init_t http_port_t:tcp_socket name_connect;
allow init_t salt_port_t:tcp_socket name_connect;
allow init_t ssh_exec_t:file execute;
allow init_t sudo_exec_t:file execute;
allow init_t var_t:file { append create ioctl open read rename unlink write };

#============= systemd_hwdb_t ==============
allow systemd_hwdb_t self:capability dac_override;

#============= systemd_notify_t ==============
allow systemd_notify_t kernel_t:unix_dgram_socket sendto;

4) If the policy looks legit: # ausearch -m AVC | audit2allow -M salt_fix - create the compiled policy 5) # semodule -i salt_fix.pp - import the policy package (.pp) 6) # semodule -l | grep salt_fix - verify it's working

After all these manipulations, the connection between Minion and Master was established, the minion process was able to start, test.ping was successful, BUT!!!: most of the applied salt-states failed again due to SELinux - seems that on each specific state you should generate a new SELinux policy and apply it.

So, finally the problem is NOT fully solved.