Open amalaguti opened 1 year ago
Can you reproduce in 3006.3?
This log statement is problematic. On heavily loaded masters this can happen if minions timeout while sending job returns back to the master. return_retry_tries
defaults to 3
, setting that to 0
would prevent the log from happening.
I think the best path forward is to change the log message to be more clear as to what is likely happening.
This is happening in 3006.3 . But not occurring in 3004.2 Classic package. I am using an M.O.M and Multimaster Syndic design which was working fine with almost 30000 minions (Created using swarm, multiple minion processes in the same machine) in 3004.2.
Seeing the same messages here with all minions and masters running 3006.4. Never saw them before upgrading to 3006. All minions are connected to the same 2 masters, each of which is healthy.
Below is a versions report from one of the masters and one of the minions. These masters have ~1200 minions connected.
master01 >salt --versions-report
Salt Version:
Salt: 3006.4
Python Version:
Python: 3.10.13 (main, Oct 4 2023, 21:54:22) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.13.12
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: centos 7.9.2009 Core
locale: utf-8
machine: x86_64
release: 3.10.0-1160.95.1.el7.x86_64
system: Linux
version: CentOS Linux 7.9.2009 Core
minion02 >salt-call --versions-report
Salt Version:
Salt: 3006.4
Python Version:
Python: 3.10.13 (main, Oct 4 2023, 21:54:22) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.13.12
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: centos 7.9.2009 Core
locale: utf-8
machine: x86_64
release: 3.10.0-1160.102.1.el7.x86_64
system: Linux
version: CentOS Linux 7.9.2009 Core
We have a similar problem with duplicated events (3500 minions). Our salt architecture is : 1 Master-of-masters + 3 Salt syndics. The problem occurs when master-of-masters runs on version 3006 (tested 3006.6), while duplicated events do not occur when MoM runs on version 3005 (3005.5) and Syndics on 3006.6, so we are not updating MoM to version 3006 for now.
I'm seeing the same with 2 minions on salt version 3005.4 as well.
Sample - 2024-03-22 12:24:10.797 [salt.loaded.int.returner.local_cache][ERROR ] An extra return was detected from minion minion-id-1, please verify the minion, this could be a replay attack
Can you reproduce in 3006.3?
Haven't tried 3006.3 but it seems fixed in 3006.7
I still encounter An extra return was detected from minion minion_id please verify the minion, this could be a replay attack, server is running single master 3006.7 and minion is mixed of 3006.x still see even some minion is running on 3006.7
[salt.loaded.int.returner.local_cache][ERROR ] An extra return was detected from minion xxx please verify the minion, this could be a replay attack
@all please check your minion's return_retry_tries
config setting. If it is anything other than 0
you are likely seeing this log message because the minion timed out while trying to send the return to the master.
The default is 3....and wouldn't you want a minion to retry if it couldn't send a return to the master? Adjusting it to 0 seems sorta like a work-around, but with a bad side effect. No?
We found this problem at our booth. We have up to 10,000 minions on masters.
3006.8 and 3006.6 are also affected.
We are also seeing this on 3006.7 in a single master configuration.
Happening under 3007.1 as well.
Description Master(s) receives duplicate events from minion when running Salt scheduler task under the following configuration, it shows this error on the master log
Minion configured to connect to 2 masters (both masters servers alive and servicing ok) Having configured additional returner on the minion config
The Salt scheduler task is set by a state file to use the rawfile_json returner this way
Setup Windows Minion 3006.1 Salt Master 3006.1
Steps to Reproduce the behavior Create the schedule task and look for errors in the master log, reduce to seconds: 10 to get more frequent errors
It does not happen if minion not set to multimaster configuration or if the option return_job is set to False in the salt scheduler task
Expected behavior ERROR
Screenshots If applicable, add screenshots to help explain your problem.