saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.01k stars 5.47k forks source link

Very strange exception occurs when salt-call run from PHP #1844

Closed borgstrom closed 11 years ago

borgstrom commented 11 years ago

I'm integrating salt into another web app via the publish framework and have run into a real head scratcher.

If I run this publish command from the command line then everything is fine, but once it's run from PHP I get the following traceback in the master process:

[INFO    ] Clear payload received with command _auth
[INFO    ] Authentication request from lab1-boxen.fatbox.ca
[INFO    ] Authentication accepted from lab1-boxen.fatbox.ca
[INFO    ] AES payload received with command _pillar
Process MWorker-8:
Traceback (most recent call last):
  File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib/python2.6/dist-packages/salt/master.py", line 431, in run
    self.__bind()
  File "/usr/lib/python2.6/dist-packages/salt/master.py", line 369, in __bind
    ret = self.serial.dumps(self._handle_payload(payload))
  File "/usr/lib/python2.6/dist-packages/salt/master.py", line 392, in _handle_payload
    'clear': self._handle_clear}[key](load)
  File "/usr/lib/python2.6/dist-packages/salt/master.py", line 419, in _handle_aes
    return self.aes_funcs.run_func(data['cmd'], data)
  File "/usr/lib/python2.6/dist-packages/salt/master.py", line 934, in run_func
    ret = getattr(self, func)(load)
  File "/usr/lib/python2.6/dist-packages/salt/master.py", line 619, in _pillar
    load['env'])
  File "/usr/lib/python2.6/dist-packages/salt/pillar/__init__.py", line 75, in __init__
    self.matcher = salt.minion.Matcher(self.opts)
  File "/usr/lib/python2.6/dist-packages/salt/minion.py", line 666, in __init__
    functions = salt.loader.minion_mods(self.opts)
  File "/usr/lib/python2.6/dist-packages/salt/loader.py", line 51, in minion_mods
    functions = load.apply_introspection(load.gen_functions())
  File "/usr/lib/python2.6/dist-packages/salt/loader.py", line 430, in gen_functions
    virtual = mod.__virtual__()
  File "/usr/lib/python2.6/dist-packages/salt/modules/groupadd.py", line 14, in __virtual__
    return 'group' if __grains__['kernel'] == 'Linux' else False
KeyError: 'kernel'

The PHP code executing this is pretty straight forward:

        private function salt_call($function) {
                $cmd = "salt-call publish.full_data " . $this->hostname . " $function";

                # run salt call and get our output
                $out = shell_exec("$cmd 2>/dev/null");
                if ($out === null) {
                        $this->addError("Failed to execute salt-call");
                        return false;
                }

It's running under php-fpm if it makes any difference.

I'm going to keep hacking at it and see if I can get anywhere with it but thought I'd open a ticket in case there's something that jumps out at anyone else.

thatch45 commented 11 years ago

basically the grains are not making it out to the master, so for some reason when invoking the command from php it is unable to generate all of the grains, can you run salt-call grains.items from php and look at the output?

borgstrom commented 11 years ago

From the web process all invocations of salt-call result in that same exception. If I run it through the cli version of PHP then everything works as expected.

Is it possible that the environment the fastcgi process provides to salt-call somehow prevents it from loading some grains?

Since the exception always happened in groupadd.py I used pprint to dump __grains__ and there's somethings in there, just not kernel. (??)

[INFO    ] Clear payload received with command _auth
[INFO    ] Authentication request from lab1-boxen.fatbox.ca
[INFO    ] Authentication accepted from lab1-boxen.fatbox.ca
[INFO    ] AES payload received with command _pillar
{'domain': 'fatbox.ca',
 'fqdn': 'lab1-boxen.fatbox.ca',
 'host': 'lab1-boxen',
 'id': 'lab1-boxen.fatbox.ca',
 'localhost': 'lab1-boxen',
 'pythonpath': ['/usr/bin',
                '/usr/local/lib/python2.6/dist-packages/ipaddr-2.1.10-py2.6.egg',
                '/usr/local/lib/python2.6/dist-packages/Sider-0.2.0-py2.6.egg',
                '/usr/local/lib/python2.6/dist-packages/redis-2.6.2-py2.6.egg',
                '/usr/lib/python2.6',
                '/usr/lib/python2.6/plat-linux2',
                '/usr/lib/python2.6/lib-tk',
                '/usr/lib/python2.6/lib-old',
                '/usr/lib/python2.6/lib-dynload',
                '/usr/local/lib/python2.6/dist-packages',
                '/usr/lib/python2.6/dist-packages',
                '/usr/lib/pymodules/python2.6',
                '/usr/lib/python2.6/dist-packages/wx-2.8-gtk2-unicode'],
 'pythonversion': [2, 6, 6, 'final', 0],
 'saltpath': '/usr/lib/python2.6/dist-packages/salt',
 'saltversion': '0.10.2',
 'server_id': 1557383467,
 'shell': '/bin/sh'}
Process MWorker-5:
Traceback (most recent call last):
  ... <same as before> ...
borgstrom commented 11 years ago

Ok, I can now reproduce it from the console by running the script as the www-data user via sudo.

I've done some debugging and it is a permissions problem, here's the actual traceback of what's causing the problem:

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/salt/grains/core.py", line 375, in os_data
    grains.update(_kernel())
  File "/usr/lib/python2.6/dist-packages/salt/grains/core.py", line 36, in _kernel
    grains['kernel'] = __salt__['cmd.run']('uname -s').strip()
  File "/usr/lib/python2.6/dist-packages/salt/modules/cmdmod.py", line 150, in _run_quiet
    quiet=True, shell=shell, env=env)['stdout']
  File "/usr/lib/python2.6/dist-packages/salt/modules/cmdmod.py", line 114, in _run
    proc = subprocess.Popen(cmd, **kwargs)
  File "/usr/lib/python2.6/subprocess.py", line 623, in __init__
    errread, errwrite)
  File "/usr/lib/python2.6/subprocess.py", line 1141, in _execute_child
    raise child_exception
OSError: [Errno 13] Permission denied: '/root'
CRITICAL:salt.loader:Failed to load grains defined in grain file core.os_data in function <function os_data at 0x2668578>, error: [Errno 13] Permission denied: '/root'

In cmdmod.py in _run there's:

# Set the default working directory to the home directory
# of the user salt-minion is running as.  Default:  /root
if not cwd:
    cwd = os.path.expanduser('~{0}'.format('' if not runas else runas))

Since /root is 0700 www-data can't use it as a cwd.

Thoughts on how to handle this? Do we make / the cwd by default if there's no runas?

borgstrom commented 11 years ago

One more piece of info.

If I su - www-data and then run the script via the php cli then everything works and os.path.expanduser does select /var/www as the cwd. So it appears that php-fpm is invoking its children in a similar manor to sudo and ~ is expanding to /root.

In our python web apps they run as a non-privileged user so their ~ expands to their homes as well, and is why this never came up before in our web integrations.

borgstrom commented 11 years ago

Once I fixed the permission denied error (by defaulting cwd to / if no runas was provided) I ran into yet another issue in the os_data function:

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/salt/grains/core.py", line 500, in os_data
    grains.update(_hw_data(grains))
  File "/usr/lib/python2.6/dist-packages/salt/grains/core.py", line 669, in _hw_data
    grains.update(_dmidecode_data(linux_dmi_regex))
  File "/usr/lib/python2.6/dist-packages/salt/grains/core.py", line 603, in _dmidecode_data
    if not salt.utils.which('dmidecode'):
  File "/usr/lib/python2.6/dist-packages/salt/utils/__init__.py", line 248, in which
    for path in os.environ.get('PATH').split(os.pathsep):
AttributeError: 'NoneType' object has no attribute 'split'

And here's the ultra simplistic environment supplied by the fastcgi environment:

{'HOME': '/var/www', 'PWD': '/var/www/', 'USER': 'www-data'}

A pull request to use a sane default path in which is coming in a second.

thatch45 commented 11 years ago

Sorry I have not been around this weekend on these, but this patch looks good, I am coming up to speed now

borgstrom commented 11 years ago

With #1849 now closed there is only one last thing for this issue: The default cwd for _run.

As noted above, when run from things like sudo or php-fpm ~ expands to /root, which is 0700, so the user that has been changed to (ie. www-data) cannot use /root as the default cwd.

My "solution" (that I'm running in my lab) is to use / as the cwd if no runas parameter is supplied. Seems sane to me but wanted your input before I sent a pull request.

thatch45 commented 11 years ago

I think that this would break a lot of assumptions that people make about the default cwd, do you think we could make a try except block so that we can fall back to / if ~ does not work out?

borgstrom commented 11 years ago

Fair enough -- People and their assumptions. pbbttth :)

The try/except block would need to be much further down _run so I just used os.access to check if we have R_OK on the cwd and default to / if the check fails. This was just tested in the lab and is working good.

Pull request coming in a second.

thatch45 commented 11 years ago

Good man, I greatly appreciate your help @borgstrom ! I need to get you some tshirts :)