saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
13.98k stars 5.47k forks source link

[BUG] Without use_master_when_local=True minion ignores master parameter #57866

Open litnimax opened 3 years ago

litnimax commented 3 years ago

In this config:

file_client: local
master: 1.2.3.4
use_master_when_local: False

minion is tries to connect to 127.0.0.1:

21:19:31 - salt.minion:245 - DEBUG - Master URI: tcp://127.0.0.1:4506

But if I set use_master_when_local: True minion starts to connect to the master:

21:22:58 - salt.transport.zeromq:258 - DEBUG - Connecting the Minion to the Master URI (for the return server): tcp://1.2.3.4:4506
21:22:58 - salt.transport.zeromq:1300 - DEBUG - Trying to connect to: tcp://1.2.3.4:4506
litnimax commented 3 years ago

Moreover setting use_master_when_local: True when master_type: disabled always produces and error:

root@devmax:/srv/salt# salt-minion
21:53:35 - salt.minion:542 - WARNING - Master is set to disable, skipping connection
21:53:39 - salt.minion:1949 - WARNING - The minion function caused an exception
Exception ignored in: <bound method AsyncZeroMQReqChannel.__del__ of <salt.transport.zeromq.AsyncZeroMQReqChannel object at 0x7fdb287b1a90>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/salt/transport/zeromq.py", line 303, in __del__
    with self._refcount_lock:
AttributeError: 'AsyncZeroMQReqChannel' object has no attribute '_refcount_lock'
Exception ignored in: <bound method RemotePillar.__del__ of <salt.pillar.RemotePillar object at 0x7fdb28690a90>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/salt/pillar/__init__.py", line 358, in __del__
    self.destroy()
  File "/usr/local/lib/python3.6/dist-packages/salt/pillar/__init__.py", line 354, in destroy
    self.channel.close()
AttributeError: 'RemotePillar' object has no attribute 'channel'
s0undt3ch commented 3 years ago

As for the config:

file_client: local
master: 1.2.3.4
use_master_when_local: False

This is confirmed. When we set use_master_when_local: True, as expected, it tries to connect to the master set in the config.

When setting use_master_when_local: True and master_type: disabled we get [ERROR ] Invalid keyword 'disabled' for variable 'master_type' When setting use_master_when_local: True and master_type: disable, the salt-minion starts:

/testing # salt-minion
[WARNING ] Error loading grains, unexpected linux_gpu_data output, check that you have a valid shell configured and permissions to run lspci command
[WARNING ] Master is set to disable, skipping connection
[WARNING ] Error loading grains, unexpected linux_gpu_data output, check that you have a valid shell configured and permissions to run lspci command

However, calling salt-call we get a similar error(with or without --local):

/testing # salt-call test.ping
[WARNING ] Error loading grains, unexpected linux_gpu_data output, check that you have a valid shell configured and permissions to run lspci command
[WARNING ] Master is set to disable, skipping connection 
[ERROR   ] An un-handled exception was caught by salt's global exception handler:
KeyError: 'master_uri'              
Traceback (most recent call last):
  File "/usr/local/bin/salt-call", line 8, in <module>
    sys.exit(salt_call())                             
  File "/usr/local/lib/python3.7/site-packages/salt/scripts.py", line 472, in salt_call
    client.run()                                                                                                
  File "/usr/local/lib/python3.7/site-packages/salt/cli/call.py", line 48, in run
    caller = salt.cli.caller.Caller.factory(self.config)                         
  File "/usr/local/lib/python3.7/site-packages/salt/cli/caller.py", line 64, in factory
    return ZeroMQCaller(opts, **kwargs)                                                                         
  File "/usr/local/lib/python3.7/site-packages/salt/cli/caller.py", line 329, in __init__
    super(ZeroMQCaller, self).__init__(opts)                                                                    
  File "/usr/local/lib/python3.7/site-packages/salt/cli/caller.py", line 89, in __init__
    self.minion = salt.minion.SMinion(opts)                                                                     
  File "/usr/local/lib/python3.7/site-packages/salt/minion.py", line 922, in __init__
    self.gen_modules(initial_load=True, context=context or {})                       
  File "/usr/local/lib/python3.7/site-packages/salt/minion.py", line 456, in gen_modules
    pillarenv=self.opts.get("pillarenv"),                                                                       
  File "/usr/local/lib/python3.7/site-packages/salt/pillar/__init__.py", line 101, in get_pillar
    extra_minion_data=extra_minion_data,                                                                        
  File "/usr/local/lib/python3.7/site-packages/salt/pillar/__init__.py", line 301, in __init__
    self.channel = salt.transport.client.ReqChannel.factory(opts)                             
  File "/usr/local/lib/python3.7/site-packages/salt/transport/client.py", line 28, in factory
    AsyncReqChannel.factory, (opts,), kwargs, loop_kwarg="io_loop",                          
  File "/usr/local/lib/python3.7/site-packages/salt/utils/asynchronous.py", line 70, in __init__
    self.obj = cls(*args, **kwargs)                                                                             
  File "/usr/local/lib/python3.7/site-packages/salt/transport/client.py", line 133, in factory
    return salt.transport.zeromq.AsyncZeroMQReqChannel(opts, **kwargs)                        
  File "/usr/local/lib/python3.7/site-packages/salt/transport/zeromq.py", line 178, in __new__
    obj.__singleton_init__(opts, **kwargs)                                                                      
  File "/usr/local/lib/python3.7/site-packages/salt/transport/zeromq.py", line 255, in __singleton_init__
    self.auth = salt.crypt.AsyncAuth(self.opts, io_loop=self._io_loop)                                   
  File "/usr/local/lib/python3.7/site-packages/salt/crypt.py", line 491, in __new__
    key = cls.__key(opts)                                                                                       
  File "/usr/local/lib/python3.7/site-packages/salt/crypt.py", line 510, in __key
    opts["master_uri"],  # master ID                                                                            
KeyError: 'master_uri'              
Traceback (most recent call last):
  File "/usr/local/bin/salt-call", line 8, in <module>                                                          
    sys.exit(salt_call())         
  File "/usr/local/lib/python3.7/site-packages/salt/scripts.py", line 472, in salt_call       
    client.run()             
  File "/usr/local/lib/python3.7/site-packages/salt/cli/call.py", line 48, in run
    caller = salt.cli.caller.Caller.factory(self.config)               
  File "/usr/local/lib/python3.7/site-packages/salt/cli/caller.py", line 64, in factory
    return ZeroMQCaller(opts, **kwargs)                                                                         
  File "/usr/local/lib/python3.7/site-packages/salt/cli/caller.py", line 329, in __init__
    super(ZeroMQCaller, self).__init__(opts)                                                                    
  File "/usr/local/lib/python3.7/site-packages/salt/cli/caller.py", line 89, in __init__
    self.minion = salt.minion.SMinion(opts)                                                                     
  File "/usr/local/lib/python3.7/site-packages/salt/minion.py", line 922, in __init__
    self.gen_modules(initial_load=True, context=context or {})
  File "/usr/local/lib/python3.7/site-packages/salt/minion.py", line 456, in gen_modules
    pillarenv=self.opts.get("pillarenv"),
  File "/usr/local/lib/python3.7/site-packages/salt/pillar/__init__.py", line 101, in get_pillar
    extra_minion_data=extra_minion_data,
  File "/usr/local/lib/python3.7/site-packages/salt/pillar/__init__.py", line 301, in __init__
    self.channel = salt.transport.client.ReqChannel.factory(opts)
  File "/usr/local/lib/python3.7/site-packages/salt/transport/client.py", line 28, in factory
    AsyncReqChannel.factory, (opts,), kwargs, loop_kwarg="io_loop",
  File "/usr/local/lib/python3.7/site-packages/salt/utils/asynchronous.py", line 70, in __init__
    self.obj = cls(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/salt/transport/client.py", line 133, in factory
    return salt.transport.zeromq.AsyncZeroMQReqChannel(opts, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/salt/transport/zeromq.py", line 178, in __new__
    obj.__singleton_init__(opts, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/salt/transport/zeromq.py", line 255, in __singleton_init__
    self.auth = salt.crypt.AsyncAuth(self.opts, io_loop=self._io_loop)
  File "/usr/local/lib/python3.7/site-packages/salt/crypt.py", line 491, in __new__
    key = cls.__key(opts)
  File "/usr/local/lib/python3.7/site-packages/salt/crypt.py", line 510, in __key
    opts["master_uri"],  # master ID
KeyError: 'master_uri'
Exception ignored in: <function AsyncZeroMQReqChannel.__del__ at 0x7f38dde099e0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/salt/transport/zeromq.py", line 303, in __del__
    with self._refcount_lock:
AttributeError: 'AsyncZeroMQReqChannel' object has no attribute '_refcount_lock'
Exception ignored in: <function RemotePillar.__del__ at 0x7f38de0da290>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/salt/pillar/__init__.py", line 358, in __del__
    self.destroy()
  File "/usr/local/lib/python3.7/site-packages/salt/pillar/__init__.py", line 354, in destroy
    self.channel.close()
AttributeError: 'RemotePillar' object has no attribute 'channel'
waynew commented 3 years ago

@litnimax Great talk at SaltConf! Glancing at this - I wonder if changing File "/usr/local/lib/python3.7/site-packages/salt/crypt.py", line 510, in __key opts["master_uri"], # master ID KeyError: 'master_uri' to opts.get('master_uri') would help.

Though I expect the answer is "no" - this is probably a lot larger issue 🤔

litnimax commented 3 years ago

Thanks!!! I will test

sagetherage commented 3 years ago

the Core team won't be able to get to this in Aluminium, we will review any PRs submitted

rrrix commented 1 year ago

Just ran into this, but in an even worse way that killed my dev server :(

With this config:

# Masterless Minion
master_type: disable
file_client: local

file_roots:
  base:
    - /srv/salt/state
pillar_roots:
  base:
    - /srv/salt/pillar

fileserver_backend:
  - roots

metadata_server_grains: True

And then I (accidentally) forgot to disable the salt-minion systemd service while using my masterless minion. My /var/log/salt/minion file quickly filled up my ~30GB root disk in about 2.5 hours.

# ls -lh /var/log/salt/minion
-rw-r----- 1 root root 29G Sep 20 18:44 /var/log/salt/minion

Repeated ad-infinitum:

2022-09-20 16:16:36,397 [salt.minion      :534 ][WARNING ][627] Master is set to disable, skipping connection
2022-09-20 16:16:36,397 [salt.minion      :1160][CRITICAL][627] Unexpected error while connecting to salt
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/minion.py", line 1134, in _connect_minion
    yield minion.connect_master(failed=failed)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1070, in run
    yielded = self.gen.send(value)
  File "/usr/lib/python3/dist-packages/salt/minion.py", line 1365, in connect_master
    self.req_channel = salt.transport.client.AsyncReqChannel.factory(
  File "/usr/lib/python3/dist-packages/salt/transport/client.py", line 83, in factory
    return salt.channel.client.AsyncReqChannel.factory(opts, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/channel/client.py", line 127, in factory
    auth = salt.crypt.AsyncAuth(opts, io_loop=io_loop)
  File "/usr/lib/python3/dist-packages/salt/crypt.py", line 506, in __new__
    key = cls.__key(opts)
  File "/usr/lib/python3/dist-packages/salt/crypt.py", line 525, in __key
    opts["master_uri"],  # master ID
KeyError: 'master_uri'

~20 million exceptions between 2022-09-20 16:16:36,385 (first log entry) and 2022-09-20 18:44:12,741 (last log entry) - about 2.5 hours.

# grep 'opts\["master_uri"\]' /var/log/salt/minion | wc -l
20623786
rrrix commented 1 year ago

This works:

diff --git a/salt/minion.py b/salt/minion.py
index cecc4f4adf..40a1584f06 100644
--- a/salt/minion.py
+++ b/salt/minion.py
@@ -533,7 +533,7 @@ class MinionBase:
         if opts["master_type"] == "disable":
             log.warning("Master is set to disable, skipping connection")
             self.connected = False
-            raise salt.ext.tornado.gen.Return((None, None))
+            raise SaltSystemExit(1, "Master Connection Disabled")

         # Run masters discovery over SSDP. This may modify the whole configuration,
         # depending of the networking and sets of masters.

As far as I can figure, I don't think master_type: disable was actually ever tested - at least not thoroughly. The previous line raise salt.ext.tornado.gen.Return((None, None)) has literally no effect, given that the function is always returned as a coroutine generator, which itself lives inside an infinite loop.

I'm looking a little bit more to see what test cases might look like for a PR, but I'm afraid I might have to build out a non-trivial amount of code to test if any Salt daemon should self-exit, given a certain configuration (such as this).

This patch also may not be viable for a scenario such as:

jamest-pin commented 10 months ago

I had similar issue, except I was not even using use_master_when_local at all. Plus, the error message was completely wrong, leading me to spend hours and days on troubleshooting this.

My config:

---
master: salt-master
file_client: local
...

Salt was saying

Error while bringing up minion for multi-master. Is master at salt-master responding?

So I spent ages troubleshooting network, permissions, SELinux, etc etc.

Only when looking in the DEBUG logs did I see that salt was actually trying to connect to

Master URI: tcp://127.0.0.1:4506

The same happened when I put a direct IP in for the master address. I then discovered that the file_client: key was causing salt minion to look for master locally.

Hopefully this can be fixed, at least the log output so future troubleshooters are not so mislead as I was.