saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.09k stars 5.47k forks source link

[BUG] Custom runners loaded from 'runner_dirs:' defined in master config file fail to run from Reactor. #61724

Open notsure44 opened 2 years ago

notsure44 commented 2 years ago

Description A custom Salt Runner is reliably and successfully run via salt-run <file_name>.<function_name> <kwargs> This same Runner when called via Reactor will initially work, but will then fail with "weakly-referenced object no longer exists". It can be made to work again by restarting the salt-master service.

salt-run <file_name>.<function_name> <kwargs> continues to work even after reactor causes the "weakly-referenced object no longer exists" error.

Documentation shows that Runners are valid Reactions: Doc link

Setup My Runner connects to the REST API of NetBox, an IP address management tool.

/etc/salt/master.d/reactor.conf

- netbox/update:
  - salt://reactor/salt-cloud/update_host_info_in_netbox.sls

salt/reactor/salt-cloud/update_host_info_in_netbox.sls

update-netbox:
  runner.netbox.add_record:
    - args:
      - address: {{ data["data"]['ipaddress'] }}
      - dns_name: {{ data["data"]['hostname'] }}

salt/_runners/netbox/init.py

See attached file: runner.txt

The master is a virtual machine running on KVM.

Salt Version:
          Salt: 3004

Dependency Versions:
          cffi: 1.14.5
      cherrypy: unknown
      dateutil: 2.6.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: 0.35.2
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: Not Installed
        pygit2: Not Installed
        Python: 3.6.8 (default, Mar 19 2021, 05:13:41)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 19.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: centos 8
        locale: UTF-8
       machine: x86_64
       release: 4.18.0-240.15.1.el8_3.x86_64
        system: Linux
       version: CentOS Linux 8

Steps to Reproduce the behavior The error can be caused by sending an event. salt-run event.send netbox/update '{"data" : {"hostname": "my-hostname", "ipaddress":"1.1.1.1"}}'

This will work after a salt-master restart, but consecutive tries will eventually fail with weakly-referenced object no longer exists

Expected behavior The Runner module seems to be good as it can be run reliably with salt-run. I think the Reactor system is garbage collecting the Runner module and not reloading it when it gets called again. But I am on the fringe of my knowledge here.

Full error

2022-02-25 03:34:44,442 [salt.loader.lazy :791 ][ERROR   ][3848524] Failed to import runners netbox, this is due most likely to a syntax error:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 752, in _load_module
    self._reload_submodules(mod)
  File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 619, in _reload_submodules
    for submodule in submodules:
  File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 615, in <genexpr>
    if isinstance(getattr(mod, sname), mod.__class__)
ReferenceError: weakly-referenced object no longer exists

this is due most likely to a syntax error <-This is misleading, if we look at salt/loader/lazy.py we can see that this is a catch all exception.

except Exception as error:  # pylint: disable=broad-except
            log.error(
                "Failed to import %s %s, this is due most likely to a syntax error:\n",
                self.tag,
                name,
                exc_info=True,
            )
            self.missing_modules[name] = error
            return False
welcome[bot] commented 2 years ago

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

notsure44 commented 2 years ago

To simplify things I wrote a new Runner that does nothing but write a log.

# -*- coding: utf-8 -*-
'''
Runner to do nothing significant.
'''
import logging

log = logging.getLogger(__name__)

__virtualname__ = "foo"

def __virtual__():
    return __virtualname__

def bar(**kwargs):
    log.warning("This should always work ********************************")
    log.warning(kwargs)

This works, and then fails in the same way. Oddly, if I restart the master, then send and event to trigger the Runner it continues to work if I keep sending the event over and over again. If no event triggers the Runner for a few minutes then it will fail with the same error:

2022-02-25 20:20:19,581 [salt.loader.lazy :791 ][ERROR   ][3970747] Failed to import runners foo, this is due most likely to a syntax error:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 752, in _load_module
    self._reload_submodules(mod)
  File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 619, in _reload_submodules
    for submodule in submodules:
  File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 615, in <genexpr>
    if isinstance(getattr(mod, sname), mod.__class__)
ReferenceError: weakly-referenced object no longer exists

So to reproduce this issue, simply create any custom runner, and try to trigger it with an event through the Reactor system. Then wait a few minutes and send another event, you should get the same error.

Also, the error is not always logged to /etc/log/salt/master at info level. The error is always visible by watching the event log with salt-run state.event. I find it best to tail the master log file, and watch the event log.

notsure44 commented 2 years ago

After further troubleshooting I have found that if I move the runner to the distro directory: /usr/lib/python3.6/site-packages/salt/runners/ The problem goes away.

So the issue appears to have something to do with custom runners loading from the runner_dirs: directory defined in /etc/salt/master

notsure44 commented 2 years ago

More specifics. My master runner_dirs: is /srv/salt/_runners I have /srv/salt/_runners/my_runner/__init__.py This will always start to fail when run from a Reactor as indicated above. If I rename my Runner and move it as /srv/salt/_runners/my_runner.py Then everything works normally. The bug is when the Runner is a named directory with an init.py file rather than just a .py file.