Open thbar opened 5 years ago
I understand that fixing this very error will only fix the error reporting, not the underlying issue, which is related to tmp folder handling!
So annoying! I have 'fixed' this stupid exception text at least 3 times now :) Change will be on master in ~20 minutes. Thanks a ton for reporting!
I'd be curious to know what is wrong with your machine to cause the error. Presumably you are not receiving it when running under regular Ansible? If so, that's a bug
@dw you welcome on the reporting :smile: no problem - mitogen is making my life significantly better than it's totally worth helping at improving it a bit!
I'm definitely not getting the error when mitogen is disabled.
The interesting thing, though, is that this does not happen for all the target machines, only for a few.
I have an idea about what may be the culprit. I'll investigate today and report back.
Thanks for investigating. Things to look out for:
Travis is being annoying -- one of the jobs is stick despite restarting it. The fix is on issue575 branch if you don't want to wait ;)
On the 2 failing hosts, there is indeed a problem to find an non-noexec
folder where mitogen would be able to work, and this is caused by the setup.
Rather than changing this (we're trying to move away from exec
in some places), I'm now looking for ways to provide a specific folder for tmp here (in a place where I know things will work, e.g. /home/the-ansible-user/tmp
).
Is there a way to achieve this @dw ? If you have an idea, please let me know! (I'll dig into the code too).
Also (sidenote), I'm kind of surprised that the beginning of this method will create not the temp dir itself, but the place where the temp dir seems to be expected (and one of these folders is /home/the-ansible-user
directly):
It won't do much harm, but I would rather not see a folder like /home/the-ansible-user
attempted to be created. Is it done on purpose?
The makedirs logic is to approximate the handling of the standard ansible.module_utils
, which attempts to create ~/.ansible/tmp
by default. I'm very happy to tighten this up so we only ever try to makedirs the same directories as Ansible. Mitogen and Ansible differ heavily in temp file handling, Mitogen tries to have only a single location for the duration of the run, whereas Ansible creates up to 3 directories for every task.
The noexec check was added to avoid picking a filesystem where running non-Python Ansible modules (e.g. written in Go, bash or perl) would fail. I don't think Ansible has any similar check, which means we could potentially just remove the check, or move it into runner.py
where it prints a descriptive error if a program run fails.
Unfortunately bug #321 makes no reference to why the noexec check was added. It's quite a specific check, and I'm not sure I added it 'simply because'. It might have been due to an issue reported via IRC.
I'm tempted to simply disable the noexec check and wait for bug reports :)
Thanks for the context - it's rich, as expected for such a project :smile:
At least on my setups, if I remove the noexec
check, things just run smoothly...
I wonder why it was added, too, in the first place, and I certainly don't want to rush you into changing this too hastily, yet waiting for bug reports could be a good way to figure out the why.
Re-reading #321, I wonder if you could find a way to avoid trying to replicate exactly what Ansible does, because it seems to be sooo complicated (but I realise this is a naive view, as a total newcomer to the project). I have not pondered the implications, though!
One last note for today: if I focus on the failing hosts, and disable the noexec
check, I notice that for each host, 2 different tmp paths are mentioned in the logs:
$ pipenv run ansible-playbook build-all.yml --tags focus --diff --check --limit sv-tca-fluite99,sv-tca-geoite03 -vvv | grep "Selected"
[mux 25184] 17:30:13.473724 D mitogen.ctx.ssh.$$FIRST-IP$$: ansible_mitogen.target: Selected temp directory: u'/home/deploy/.ansible/tmp' (from [u'/home/deploy/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux 25184] 17:30:16.412597 D mitogen.ctx.ssh.$$FIRST-IP$$.sudo.postgres: ansible_mitogen.target: Selected temp directory: u'/var/lib/postgresql/.ansible/tmp' (from [u'/var/lib/postgresql/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux 25184] 17:30:19.125581 D mitogen.ctx.ssh.$$SECOND-IP$$: ansible_mitogen.target: Selected temp directory: u'/home/deploy/.ansible/tmp' (from [u'/home/deploy/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux 25184] 17:30:21.784794 D mitogen.ctx.ssh.$$SECOND-IP$$.sudo.postgres: ansible_mitogen.target: Selected temp directory: u'/var/lib/postgresql/.ansible/tmp' (from [u'/var/lib/postgresql/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp/user/114', '/tmp/user/114', '/tmp/user/114', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
It must be noted that for a given host, the first line of log will mention /home/deploy/.ansible/tmp
, whereas the second one (called from I think the ANXS.postgresql role), will attempt to use /var/lib/postgresql/.ansible/tmp
.
On both hosts, this second folder would not be authorized with the noexec
check, because on those two machines, /var
is noexec
.
So well - not sure if this is useful at this point, but mentioning this in case it helps later.
Temp file handling is a complete nightmare :) It has been rewritten at least 3 or 4 times already, and every tweak breaks some install somewhere, or some older version of Ansible. It would be much better if no temp files were used at all (as in the original prototype), but far too many modules expect a temp dir to exist.
For example with Ansible, when pipelining=False and become=true, Ansible still creates temp files in the SSH login account and sets perms so the become account can read them. Mitogen always keeps temp files within the target account -- which meant the become user homedir must be writeable, breaking some installs. That's why Mitogen has quite a huge list of candidate temp dirs -- it is known in some cases to break working temp dir setups, so we try hard to find any working configuration rather than error
Re: your last comment, sorry, it is one temp dir per account, not per run.
Got you! Happy to provide more testing if needed, just hit me up!
Ansible version: 2.5.14 (not patched nor running custom modules)
Mitogen: v0.2.6 (I did not run master, but checked the diff and did not notice anything that would fix this I think - but can try later)
Idea of what the underlying problem may be:
I believe an error occurs while Mitogen tries to build an exception error message here:
https://github.com/dw/mitogen/blob/2758c38f4f939b1b71555a24a2e8cd191ec4423b/ansible_mitogen/target.py#L330-L332
The
paths
key is provided, but the interpolated string expectsnamelist
:https://github.com/dw/mitogen/blob/2758c38f4f939b1b71555a24a2e8cd191ec4423b/ansible_mitogen/target.py#L90-L99
Host: pipenv 9.0.1, Python 2.7.14, Mac OS X 10.14.3
Target: Python 2.7.12, Ubuntu 16.04.4 LTS
If reporting a crash or hang in Ansible, please rerun with -vvv and include 200 lines of output around the point of the error, along with a full copy of any traceback or error text in the log. Beware "-vvv" may include secret data! Edit as necessary before posting.
Sorry I do not have (at this point) the time required to properly edit and anonymize. I'll still post what I have (anonymized short version which I got with
-vvv
):(anonymized)