saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
13.99k stars 5.47k forks source link

Fedora 40 (rawhide) minion cannot maintain connection to master on RHEL 8. #65844

Open bchill opened 5 months ago

bchill commented 5 months ago

Discussed in https://github.com/saltstack/salt/discussions/65825

Originally posted by **bchill** January 9, 2024 I have a master running on RHEL 8 and a minion running on Fedora 40. Both OS are up-to-date and both are running salt rpms: RHEL 8 master: salt*-3006.5-0.x86_64 Fedora 40 minion: salt*-3006.5-1.fc40.noarch I have several other minions running on various Ubuntu/RHEL/Rocky/CentOS versions, some ipv4 and some ipv6, some behind home routers and some not - no problems with any of those. This makes me suspect that this is an issue with the 'newness' of fc40 and/or maybe the use of Python 3.12 (used by the rpm - not my doing). The master accepts the key, but. after that, the connection either cannot be established or keeps dropping as the minion tries repeatedly to connect to the master: ``` 2024-01-09 14:49:52,220 [salt.channel.client:789 ][TRACE ][12345] Failed to send msg SaltReqTimeoutError('Message timed out') 2024-01-09 14:49:52,220 [salt.channel.client:789 ][TRACE ][12345] ReqChannel send clear load={'cmd': '_auth', 'id': 'minion.example.com', 'nonce': 'xxxxxx', 'autosign_grains': {'uuid': 'XXXXXXXXXXX'}, 'pub': '-----BEGIN PUBLIC KEY-----\nxxxxxxxx\n-----END PUBLIC KEY-----'} 2024-01-09 14:49:57,226 [salt.channel.client:789 ][TRACE ][12345] Failed to send msg SaltReqTimeoutError('Message timed out') 2024-01-09 14:49:57,227 [salt.channel.client:789 ][TRACE ][12345] ReqChannel send clear load={'cmd': '_auth', 'id': 'minion.example.com', 'nonce': 'xxxxxx', 'autosign_grains': {'uuid': 'XXXXXXXXXXX'}, 'pub': '-----BEGIN PUBLIC KEY-----\nxxxxxxxx\n-----END PUBLIC KEY-----'} 2024-01-09 14:50:02,234 [salt.channel.client:789 ][TRACE ][12345] Failed to send msg SaltReqTimeoutError('Message timed out') 2024-01-09 14:50:02,235 [salt.channel.client:789 ][DEBUG ][12345] Closing AsyncReqChannel instance 2024-01-09 14:50:02,238 [salt.minion :789 ][ERROR ][12345] Error while bringing up minion for multi-master. Is master at master.example.com responding? 2024-01-09 14:50:12,249 [salt.minion :789 ][DEBUG ][12345] Connecting to master. Attempt 1 of 1 2024-01-09 14:50:12,251 [salt.utils.network:789 ][DEBUG ][12345] "master.example.com" Not an IP address? Assuming it is a hostname. 2024-01-09 14:50:12,331 [salt.minion :789 ][TRACE ][12345] Custom source interface required: enp2s0f0 ``` nc shows that the master is accessible on both 4505/tcp and 4506/tcp. ss shows only intermittent connections established to 4506 (on the minion) but never 4505. The 4506 connections last about 30 seconds before disappearing. The master reports nothing wrong and simply cycles with the minion's connection requests: ``` 2024-01-09 18:39:30,365 [salt.channel.server:289 ][INFO ][123456] Authentication request from minion.example.com 2024-01-09 18:39:30,366 [salt.utils.user :23 ][TRACE ][123456] Trying os.getgrouplist for 'root' 2024-01-09 18:39:30,368 [salt.utils.user :23 ][TRACE ][123456] Group list for user 'root': [] 2024-01-09 18:39:30,368 [salt.channel.server:547 ][INFO ][123456] Authentication accepted from minion.example.com 2024-01-09 18:39:30,368 [salt.crypt :216 ][DEBUG ][123456] salt.crypt.get_rsa_pub_key: Loading public key 2024-01-09 18:39:30,383 [salt.transport.ipc:23 ][TRACE ][123456] IPCClient: Connecting to socket: /var/run/salt/master/master_event_pull.ipc 2024-01-09 18:39:30,383 [salt.transport.ipc:23 ][TRACE ][123453] IPCServer: Handling connection to address: 2024-01-09 18:39:30,384 [salt.utils.event :832 ][DEBUG ][123456] Sending event: tag = salt/auth; data = {'result': True, 'act': 'accept', 'id': 'minion.example.com', 'pub': '-----BEGIN PUBLIC KEY-----\nxxxxxxxx\n-----END PUBLIC KEY-----', '_stamp': '2024-01-10T00:39:30.384042'} 2024-01-09 18:39:30,385 [salt.crypt :280 ][DEBUG ][123456] salt.crypt.get_rsa_key: Loading private key 2024-01-09 18:39:30,385 [salt.crypt :293 ][DEBUG ][123456] salt.crypt.sign_message: Signing message. ``` I really don't know what to look for at this point. Is this indeed a compatibility bug related to different python versions or some such? Thanks for any help! Brian
dwoz commented 5 months ago

@bchill Python 3.12 is the culprit here. We only test python 3.10 at the moment. You'll have better luck with our packages which ship with 3.10.

alrf commented 2 months ago

I have the same issue on Fedora CoreOS (FCOS): Debian-11 Master: salt-3006.6 (onedir installation, /opt/saltstack/salt/bin/python3.10) FCOS Minion: salt-3006.7 (Python 3.12.2) On FCOS salt is delivered as a separate repo and package.

vectorsigma commented 1 month ago

Upstream bug

Also the documentation listed above doesn't offer Fedora 39 or Fedora 40 installation instructions, which is a problem now that Fedora 38 is EOL.