mitogen-hq / mitogen

Distributed self-replicating programs in Python
https://mitogen.networkgenomics.com/
BSD 3-Clause "New" or "Revised" License
2.32k stars 197 forks source link

CallError: exceptions.KeyError: u'namelist' #575

Open thbar opened 5 years ago

thbar commented 5 years ago
TASK [ANXS.postgresql : PostgreSQL | Make sure the PostgreSQL users are present] ***********************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was:     'paths': '\n    '.join(paths),
fatal: [the-host]: FAILED! => 
  msg: Unexpected failure during module execution.
  stdout: ''

I believe an error occurs while Mitogen tries to build an exception error message here:

https://github.com/dw/mitogen/blob/2758c38f4f939b1b71555a24a2e8cd191ec4423b/ansible_mitogen/target.py#L330-L332

The paths key is provided, but the interpolated string expects namelist:

https://github.com/dw/mitogen/blob/2758c38f4f939b1b71555a24a2e8cd191ec4423b/ansible_mitogen/target.py#L90-L99

Sorry I do not have (at this point) the time required to properly edit and anonymize. I'll still post what I have (anonymized short version which I got with -vvv):

The full traceback is:
Traceback (most recent call last):
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 104, in run
    item_results = self._run_loop(items)
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 334, in _run_loop
    res = self._execute(variables=task_vars)
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 566, in _execute
    result = self._handler.run(task_vars=variables)
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/mixins.py", line 116, in run
    return super(ActionModuleMixin, self).run(tmp, task_vars)
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 46, in run
    result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/mixins.py", line 356, in _execute_module
    self._connection._connect()
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/connection.py", line 721, in _connect
    self._connect_stack(stack)
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/connection.py", line 675, in _connect_stack
    stack=mitogen.utils.cast(list(stack)),
  File "/Volumes/TheClient/mitogen-0.2.6/mitogen/core.py", line 1859, in call_service
    return recv.get().unpickle()
  File "/Volumes/TheClient/mitogen-0.2.6/mitogen/core.py", line 835, in unpickle
    raise obj
CallError: exceptions.KeyError: u'namelist'
  File "<stdin>", line 3107, in _dispatch_one
  File "master:/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/target.py", line 383, in init_child
    good_temp_dir = find_good_temp_dir(candidate_temp_dirs)
  File "master:/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/target.py", line 331, in find_good_temp_dir
    'paths': '\n    '.join(paths),

fatal: [the-host]: FAILED! => 
  msg: Unexpected failure during module execution.
  stdout: ''

(anonymized)

pipenv run ansible-config dump --only-changed
ANSIBLE_PIPELINING(/Volumes/TheClient/ansible-the-client/ansible.cfg) = True
DEFAULT_HOST_LIST(/Volumes/TheClient/ansible-the-client/ansible.cfg) = [u'/Volumes/TheClient/ansible-the-client/inventory']
DEFAULT_LOAD_CALLBACK_PLUGINS(/Volumes/TheClient/ansible-the-client/ansible.cfg) = True
DEFAULT_ROLES_PATH(/Volumes/TheClient/ansible-the-client/ansible.cfg) = [u'/Volumes/TheClient/ansible-the-client/roles', u'/Volumes/TheClient/ansible-the-client/custom_roles']
DEFAULT_STDOUT_CALLBACK(/Volumes/TheClient/ansible-the-client/ansible.cfg) = yaml
DEFAULT_STRATEGY(/Volumes/TheClient/ansible-the-client/ansible.cfg) = mitogen_linear
DEFAULT_STRATEGY_PLUGIN_PATH(/Volumes/TheClient/ansible-the-client/ansible.cfg) = [u'/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/plugins/strategy']
DEFAULT_VAULT_IDENTITY_LIST(/Volumes/TheClient/ansible-the-client/ansible.cfg) = ['vault-the-client-staging', 'vault-the-client-production', 'vault-vagrant']
thbar commented 5 years ago

I understand that fixing this very error will only fix the error reporting, not the underlying issue, which is related to tmp folder handling!

dw commented 5 years ago

So annoying! I have 'fixed' this stupid exception text at least 3 times now :) Change will be on master in ~20 minutes. Thanks a ton for reporting!

dw commented 5 years ago

I'd be curious to know what is wrong with your machine to cause the error. Presumably you are not receiving it when running under regular Ansible? If so, that's a bug

thbar commented 5 years ago

@dw you welcome on the reporting :smile: no problem - mitogen is making my life significantly better than it's totally worth helping at improving it a bit!

I'm definitely not getting the error when mitogen is disabled.

The interesting thing, though, is that this does not happen for all the target machines, only for a few.

I have an idea about what may be the culprit. I'll investigate today and report back.

dw commented 5 years ago

Thanks for investigating. Things to look out for:

Travis is being annoying -- one of the jobs is stick despite restarting it. The fix is on issue575 branch if you don't want to wait ;)

thbar commented 5 years ago

On the 2 failing hosts, there is indeed a problem to find an non-noexec folder where mitogen would be able to work, and this is caused by the setup.

Rather than changing this (we're trying to move away from exec in some places), I'm now looking for ways to provide a specific folder for tmp here (in a place where I know things will work, e.g. /home/the-ansible-user/tmp).

Is there a way to achieve this @dw ? If you have an idea, please let me know! (I'll dig into the code too).

Also (sidenote), I'm kind of surprised that the beginning of this method will create not the temp dir itself, but the place where the temp dir seems to be expected (and one of these folders is /home/the-ansible-user directly):

https://github.com/dw/mitogen/blob/333151f7fd95e73866a08c577796a5f5206e2d0f/ansible_mitogen/target.py#L263-L281

It won't do much harm, but I would rather not see a folder like /home/the-ansible-user attempted to be created. Is it done on purpose?

dw commented 5 years ago

The makedirs logic is to approximate the handling of the standard ansible.module_utils, which attempts to create ~/.ansible/tmp by default. I'm very happy to tighten this up so we only ever try to makedirs the same directories as Ansible. Mitogen and Ansible differ heavily in temp file handling, Mitogen tries to have only a single location for the duration of the run, whereas Ansible creates up to 3 directories for every task.

The noexec check was added to avoid picking a filesystem where running non-Python Ansible modules (e.g. written in Go, bash or perl) would fail. I don't think Ansible has any similar check, which means we could potentially just remove the check, or move it into runner.py where it prints a descriptive error if a program run fails.

Unfortunately bug #321 makes no reference to why the noexec check was added. It's quite a specific check, and I'm not sure I added it 'simply because'. It might have been due to an issue reported via IRC.

I'm tempted to simply disable the noexec check and wait for bug reports :)

thbar commented 5 years ago

Thanks for the context - it's rich, as expected for such a project :smile:

At least on my setups, if I remove the noexec check, things just run smoothly...

I wonder why it was added, too, in the first place, and I certainly don't want to rush you into changing this too hastily, yet waiting for bug reports could be a good way to figure out the why.

thbar commented 5 years ago

Re-reading #321, I wonder if you could find a way to avoid trying to replicate exactly what Ansible does, because it seems to be sooo complicated (but I realise this is a naive view, as a total newcomer to the project). I have not pondered the implications, though!

thbar commented 5 years ago

One last note for today: if I focus on the failing hosts, and disable the noexec check, I notice that for each host, 2 different tmp paths are mentioned in the logs:

$ pipenv run ansible-playbook build-all.yml --tags focus --diff --check --limit sv-tca-fluite99,sv-tca-geoite03 -vvv | grep "Selected"
[mux  25184] 17:30:13.473724 D mitogen.ctx.ssh.$$FIRST-IP$$: ansible_mitogen.target: Selected temp directory: u'/home/deploy/.ansible/tmp' (from [u'/home/deploy/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux  25184] 17:30:16.412597 D mitogen.ctx.ssh.$$FIRST-IP$$.sudo.postgres: ansible_mitogen.target: Selected temp directory: u'/var/lib/postgresql/.ansible/tmp' (from [u'/var/lib/postgresql/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux  25184] 17:30:19.125581 D mitogen.ctx.ssh.$$SECOND-IP$$: ansible_mitogen.target: Selected temp directory: u'/home/deploy/.ansible/tmp' (from [u'/home/deploy/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux  25184] 17:30:21.784794 D mitogen.ctx.ssh.$$SECOND-IP$$.sudo.postgres: ansible_mitogen.target: Selected temp directory: u'/var/lib/postgresql/.ansible/tmp' (from [u'/var/lib/postgresql/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp/user/114', '/tmp/user/114', '/tmp/user/114', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])

It must be noted that for a given host, the first line of log will mention /home/deploy/.ansible/tmp, whereas the second one (called from I think the ANXS.postgresql role), will attempt to use /var/lib/postgresql/.ansible/tmp.

On both hosts, this second folder would not be authorized with the noexec check, because on those two machines, /var is noexec.

So well - not sure if this is useful at this point, but mentioning this in case it helps later.

dw commented 5 years ago

Temp file handling is a complete nightmare :) It has been rewritten at least 3 or 4 times already, and every tweak breaks some install somewhere, or some older version of Ansible. It would be much better if no temp files were used at all (as in the original prototype), but far too many modules expect a temp dir to exist.

For example with Ansible, when pipelining=False and become=true, Ansible still creates temp files in the SSH login account and sets perms so the become account can read them. Mitogen always keeps temp files within the target account -- which meant the become user homedir must be writeable, breaking some installs. That's why Mitogen has quite a huge list of candidate temp dirs -- it is known in some cases to break working temp dir setups, so we try hard to find any working configuration rather than error

Re: your last comment, sorry, it is one temp dir per account, not per run.

thbar commented 5 years ago

Got you! Happy to provide more testing if needed, just hit me up!