saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

Non-root Users Unable to Review Job Status #55275

Open jpittiglio opened 5 years ago

jpittiglio commented 5 years ago

Description of Issue

Followed instructions to setup non-root users with the ability to run jobs as specified at https://docs.saltstack.com/en/latest/ref/publisheracl.html

Running jobs as non-root user completes as expected:

[ec2-user@salt ~]$ salt 'salt' test.ping
salt:
    True

Similarly, running jobs using the --async flag works as expected:

[ec2-user@salt ~]$ salt 'salt' test.ping --async

Executed command with job ID: 20191112202301098417

However, attempting to view previous jobs results using salt-run jobs.lookup_jid <x> or the salt.client.LocalClient.get_cli_returns function fails. Example:

[ec2-user@salt ~]$ salt-run jobs.lookup_jid 20191112193054933477
Exception occurred in runner jobs.lookup_jid: Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/salt/client/mixins.py", line 381, in low
    data['return'] = func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/salt/runners/jobs.py", line 128, in lookup_jid
    display_progress=display_progress
  File "/usr/lib/python3.7/site-packages/salt/runners/jobs.py", line 200, in list_job
    ret['Result'] = mminion.returners['{0}.get_jid'.format(returner)](jid)
  File "/usr/lib/python3.7/site-packages/salt/returners/local_cache.py", line 357, in get_jid
    with salt.utils.files.fopen(retp, 'rb') as rfh:
  File "/usr/lib/python3.7/site-packages/salt/utils/files.py", line 399, in fopen
    f_handle = open(*args, **kwargs)  # pylint: disable=resource-leakage
PermissionError: [Errno 13] Permission denied: '/var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/salt/return.p'

Per the linked documentation, reviewing the root level directory shows expected permissions:

[ec2-user@salt ~]$ ll /var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/
total 4
-rw-r--r-- 1 root root 20 Nov 12 19:30 jid
drwxr-xr-x 2 root root 22 Nov 12 19:30 salt

However, the return.p file shows it is read+write by root only:

[ec2-user@salt ~]$ ll /var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/salt/
total 4
-rw------- 1 root root 27 Nov 12 19:30 return.p

Setup

Standard RPM installation on an AWS EC2 instance running Amazon Linux 2. Configured to allow ec2-user to run all states on all nodes as following in /etc/salt/master:

publisher_acl:
  ec2-user:
    - .*

Executed chmod 755 /var/cache/salt /var/cache/salt/master /var/cache/salt/master/jobs /var/run/salt /var/run/salt/master as indicated in linked documentation.

Possible Solution

Issue seems to stem from the following:

https://github.com/saltstack/salt/blob/01b9405b61cd416d4c852c87bd484759f5ec9c96/salt/utils/atomicfile.py#L132

Ultimately, the return.p file is created as a new temporary file, which I assume is given the permissions read+write to root only. Once the temporary file is written and the context handler completes what it needs with the file, the close function is invoked and the temporary file is moved (os.rename on *nix) to the correct job cache location. Since it's moved, the original permissions are retained.

As a temporary workaround, I modified atomicfile.py as follows: https://github.com/saltstack/salt/blob/01b9405b61cd416d4c852c87bd484759f5ec9c96/salt/utils/atomicfile.py#L101

Instead of os.rename, I use shutil.copyfile

https://github.com/saltstack/salt/blob/01b9405b61cd416d4c852c87bd484759f5ec9c96/salt/utils/atomicfile.py#L132 After the rename occurs, I call os.remove(self._tmp_filename)

I have not fully tested this to identify long-term ramifications, but wanted to highlight a possible fix for others in a similar situation. While using an external job cache would likely be a better long-term solution, the documentation implies this should be possible.

Additionally, in some circumstances, there appear to be other non-critical issues - for example, in some scenarios, the following occurs when querying the job ID even after the fixes identified above:

[ec2-user@salt ~]$ salt-run jobs.lookup_jid 20191112202005349718
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[ERROR   ] prep_jid could not store a jid after 5 tries.
[ERROR   ] Could not store job cache info. Job details for this run may be unavailable.
salt:
    True

Note the information is still returned, but it appears a new job is trying to be created but cannot be. Likely unrelated and probably needs a separate issue, but wanted to document here.

Versions Report

Salt Version:
           Salt: 2019.2.2

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10.3
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.7.4 (default, Oct  2 2019, 19:30:55)
   python-gnupg: Not Installed
         PyYAML: 4.2
          PyZMQ: 18.1.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2

System Versions:
           dist:   
         locale: UTF-8
        machine: x86_64
        release: 4.14.138-114.102.amzn2.x86_64
         system: Linux
        version: Not Installed
eliasp commented 4 years ago

This is a problem we're seeing in several environments here as well and it shows a general architectural issue of the SaltStack CLI tooling IMHO. The CLI tools are a hybrid of a local process and a remote job execution, so issues like these show up over and over again in various places (e.g. handling of keys through salt-key etc.). IMHO, the whole CLI tooling should continuously move towards the goal of not doing any local execution at all, but to merely interact with SaltStack through the master's interface to handle jobs, which are then again completely executed by the Master - the CLI should only be a thin wrapper around all this.

Ch3LL commented 4 years ago

im able to replicate this when the salt-master/salt-minion processes started up via root. When i start them up via the same user it does work, but we want it to work while running salt via root. will need to get this fixed up.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

sagetherage commented 4 years ago

not stale

stale[bot] commented 4 years ago

Thank you for updating this issue. It is no longer marked as stale.

petiepooo commented 4 years ago

Still not stale.

sagetherage commented 4 years ago

@petiepooo no more stalebot -- this is open and will remain so