saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.16k stars 5.48k forks source link

[BUG] An extra return was detected from minion #65301

Open amalaguti opened 1 year ago

amalaguti commented 1 year ago

Description Master(s) receives duplicate events from minion when running Salt scheduler task under the following configuration, it shows this error on the master log

[ERROR   ] An extra return was detected from minion minion-win-1, please verify the minion, this could be a replay attack

Minion configured to connect to 2 masters (both masters servers alive and servicing ok) Having configured additional returner on the minion config

master: 
  - 172.21.0.10
  - 172.21.0.11

rawfile_json.filename: 'C:\ProgramData\Salt Project\Salt\var\log\salt\jobs.log'
return:
  - rawfile_json

The Salt scheduler task is set by a state file to use the rawfile_json returner this way

set_salt_sched_task:
  schedule.present:
    - function: state.sls
    - job_args:
      - win_tasks.2_win_task_sched
    - seconds: 3600
    - return_job: True
    - returner: rawfile_json 

salt-call schedule.list
local:
    schedule:
      set_salt_sched_task:
        args:
        - win_tasks.2_win_task_sched
        enabled: true
        function: state.sls
        jid_include: True
        kwargs:
          queue: true
        maxrunning: 1
        name: set_salt_sched_task
        return_job: True
        returner: rawfile_json
        saved: true
        seconds: 3600

Setup Windows Minion 3006.1 Salt Master 3006.1

Steps to Reproduce the behavior Create the schedule task and look for errors in the master log, reduce to seconds: 10 to get more frequent errors

[ERROR   ] An extra return was detected from minion minion-win-1, please verify the minion, this could be a replay attack

It does not happen if minion not set to multimaster configuration or if the option return_job is set to False in the salt scheduler task

Expected behavior ERROR

Screenshots If applicable, add screenshots to help explain your problem.

OrangeDog commented 1 year ago

Can you reproduce in 3006.3?

dwoz commented 1 year ago

This log statement is problematic. On heavily loaded masters this can happen if minions timeout while sending job returns back to the master. return_retry_tries defaults to 3, setting that to 0 would prevent the log from happening.

I think the best path forward is to change the log message to be more clear as to what is likely happening.

anandarajan-vivekanandam-agilysys commented 11 months ago

This is happening in 3006.3 . But not occurring in 3004.2 Classic package. I am using an M.O.M and Multimaster Syndic design which was working fine with almost 30000 minions (Created using swarm, multiple minion processes in the same machine) in 3004.2.

ESNewmanium commented 10 months ago

Seeing the same messages here with all minions and masters running 3006.4. Never saw them before upgrading to 3006. All minions are connected to the same 2 masters, each of which is healthy.

Below is a versions report from one of the masters and one of the minions. These masters have ~1200 minions connected.

master01 >salt --versions-report 
Salt Version:
          Salt: 3006.4

Python Version:
        Python: 3.10.13 (main, Oct  4 2023, 21:54:22) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.13.12
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: centos 7.9.2009 Core
        locale: utf-8
       machine: x86_64
       release: 3.10.0-1160.95.1.el7.x86_64
        system: Linux
       version: CentOS Linux 7.9.2009 Core
minion02 >salt-call --versions-report 
Salt Version:
          Salt: 3006.4

Python Version:
        Python: 3.10.13 (main, Oct  4 2023, 21:54:22) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.13.12
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: centos 7.9.2009 Core
        locale: utf-8
       machine: x86_64
       release: 3.10.0-1160.102.1.el7.x86_64
        system: Linux
       version: CentOS Linux 7.9.2009 Core
wolbi commented 8 months ago

We have a similar problem with duplicated events (3500 minions). Our salt architecture is : 1 Master-of-masters + 3 Salt syndics. The problem occurs when master-of-masters runs on version 3006 (tested 3006.6), while duplicated events do not occur when MoM runs on version 3005 (3005.5) and Syndics on 3006.6, so we are not updating MoM to version 3006 for now.

golmaal commented 7 months ago

I'm seeing the same with 2 minions on salt version 3005.4 as well.

Sample - 2024-03-22 12:24:10.797 [salt.loaded.int.returner.local_cache][ERROR ] An extra return was detected from minion minion-id-1, please verify the minion, this could be a replay attack

amalaguti commented 7 months ago

Can you reproduce in 3006.3?

Haven't tried 3006.3 but it seems fixed in 3006.7

broxio commented 7 months ago

I still encounter An extra return was detected from minion minion_id please verify the minion, this could be a replay attack, server is running single master 3006.7 and minion is mixed of 3006.x still see even some minion is running on 3006.7

[salt.loaded.int.returner.local_cache][ERROR ] An extra return was detected from minion xxx please verify the minion, this could be a replay attack

dwoz commented 5 months ago

@all please check your minion's return_retry_tries config setting. If it is anything other than 0 you are likely seeing this log message because the minion timed out while trying to send the return to the master.

darkpixel commented 5 months ago

The default is 3....and wouldn't you want a minion to retry if it couldn't send a return to the master? Adjusting it to 0 seems sorta like a work-around, but with a bad side effect. No?

IvanShokin commented 3 months ago

We found this problem at our booth. We have up to 10,000 minions on masters.

alrf commented 1 month ago

3006.8 and 3006.6 are also affected.

corywright commented 1 week ago

We are also seeing this on 3006.7 in a single master configuration.