saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.11k stars 5.47k forks source link

[BUG] Minion loses the connection after updating from 3004 to 3005 onedir #63325

Open monofumado opened 1 year ago

monofumado commented 1 year ago

Description I want to update my environment to 3005, for now I have updated my test master to 3005, all new minions with fresh 3005 installation seem to work correctly, but when I update minions from 3004 to 3005 then they lose the connection

Setup Minion with 3004 classic version, already in contact with master (master running 3005 onedir), then I perform the update to 3005 onedir, everything seems normal on minion side, no errors, service runs normally, but I'm not able to contact the minion from the master having the error:

    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command: 

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior

Expected behavior Communication from master to minion remains

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml Salt Version: Salt: 3005.1 Dependency Versions: cffi: 1.14.6 cherrypy: 18.6.1 dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.0 libgit2: Not Installed M2Crypto: 0.38.0 Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.9.8 pygit2: Not Installed Python: 3.9.15 (main, Nov 8 2022, 03:42:58) python-gnupg: 0.4.8 PyYAML: 5.4.1 PyZMQ: 23.2.0 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: ubuntu 22.04 jammy locale: utf-8 machine: x86_64 release: 5.15.0-56-generic system: Linux version: Ubuntu 22.04 jammy ```

Additional context Add any other context about the problem here.

OrangeDog commented 1 year ago

Possibly related to #62881

rayddteam commented 1 year ago

Hey guys! In my case only one minion start to lose a connection with ZMQ. After some investigation I found that the minion disconnected due enabled status beacon. status: [] Most likely it is related to amount of data or some special characters beacon collect on FreeBSD workstation. Thanks!

BlackMetalz commented 1 year ago

Confirmed

monofumado commented 1 year ago

Any update on this? it happens also with 3006.1

djmmatracki commented 9 months ago

I updated from 3004.2 to 3006.5 and seeing the same issue. After some time minions stop responding to test.ping. Updated to 3006.5 with one dir

djmmatracki commented 9 months ago

@monofumado Did You find a fix or workaround for this issue?

monofumado commented 8 months ago

Unfortunately not, I updated the server but still having the minions on 3004 because of the problem, if I update the minions to 3005 or 3006 they keep disconnecting from master, "workaround" is to restart the minions, but that is not really working since we don't have access to all devices

cdalvaro commented 8 months ago

I had a similar issue with my minions running on macOS (https://github.com/saltstack/salt/issues/64153)

I found that running the highstate with return_job: true made the minions to lose connection with master after the first run. So setting return_job: false fixed my issue.