Open Gaibhne opened 4 years ago
auto_silent
is an ansible_python_interpreter
value related to interpreter discovery, which Mitogen currently doesn't support. @Gaibhne can you please try my patch here: https://github.com/dw/mitogen/pull/658 and let me know if it works for you? It supports auto_silent
as the python interpreter.
@s1113950 thanks for your reply. I tried it, and it worked ... somewhat. Generally, it seems to work, but I get errors on about 25% of my hosts (randomly, it seems; the same host will sometimes work and sometimes crash). I am attaching two crash logs of two different servers (both of which worked fine in other runs). The problem seems to mostly (always ?) occur in the 'Gathering Facts' phase.
ERROR! [mux 16864] 09:43:27.685369 E mitogen: <Stream ssh.omnibus.company.com #fcd0> crashed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 3481, in _call
func(self)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1719, in on_transmit
self.protocol.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2167, in on_transmit
self._writer.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1907, in on_transmit
written = self._protocol.stream.transmit_side.write(buf)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2033, in write
written, disconnected = io_op(os.write, self.fd, s)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 553, in io_op
return func(*args), None
OSError: [Errno 11] Resource temporarily unavailable
fatal: [omnibus]: UNREACHABLE! => {"changed": false, "msg": "Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected", "unreachable": true}
And:
ERROR! [mux 17684] 09:54:50.278387 E mitogen: <Stream ssh.10.100.1.60 #cfd0> crashed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 3481, in _call
func(self)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1719, in on_transmit
self.protocol.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2167, in on_transmit
self._writer.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1907, in on_transmit
written = self._protocol.stream.transmit_side.write(buf)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2033, in write
written, disconnected = io_op(os.write, self.fd, s)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 553, in io_op
return func(*args), None
OSError: [Errno 11] Resource temporarily unavailable
fatal: [alkoholix]: UNREACHABLE! => {"changed": false, "msg": "Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected", "unreachable": true}
Ok! It's a start 🤔 can you dump more output from a run with -vvv
? It's not immediately clear to me why it would work sometimes and not other times
I made more tweaks to my patch to make interpreter discovery smarter. Can you try it again and see if it works 100% of the time @Gaibhne ?
I think I closed this prematurely, sorry about that. @Gaibhne please give latest master
of mitogen another try and see if your issue still persists
Unfortunately you are right. I just tried with a5fe4a9fac5561511b676fe61ed127b732be5b12 which is the current master and got the following result - during gather facts
, and now it seems all hosts are completely broken. Additionally, the process hangs forever after all the errors:
ERROR! [mux 775] 12:25:54.972593 E mitogen: <Stream ssh.omnibus.company.com #24d0> crashed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 3481, in _call
func(self)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1719, in on_transmit
self.protocol.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2167, in on_transmit
self._writer.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1907, in on_transmit
written = self._protocol.stream.transmit_side.write(buf)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2033, in write
written, disconnected = io_op(os.write, self.fd, s)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 553, in io_op
return func(*args), None
OSError: [Errno 11] Resource temporarily unavailable
fatal: [omnibus]: UNREACHABLE! => {"changed": false, "msg": "Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected", "unreachable": true}
Can you try with ansible version 2.8.8? Ansible 2.9+ isn't fully supported yet
Just tried with 2.8.8, no joy:
ERROR! [mux 1618] 21:16:44.610331 E mitogen: <Stream ssh.omnibus.company.com #1750> crashed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 3481, in _call
func(self)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1719, in on_transmit
self.protocol.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2167, in on_transmit
self._writer.on_transmit(broker)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 1907, in on_transmit
written = self._protocol.stream.transmit_side.write(buf)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 2033, in write
written, disconnected = io_op(os.write, self.fd, s)
File "/usr/lib/python2.7/site-packages/mitogen-0.2.9-py2.7.egg/mitogen/core.py", line 553, in io_op
return func(*args), None
OSError: [Errno 11] Resource temporarily unavailable
fatal: [omnibus]: UNREACHABLE! => {"changed": false, "msg": "Mitogen was disconnected from the remote environment while a call was in-progress. If you feel this is in error, please file a bug. Original error was: the respondent Context has disconnected", "unreachable": true}
Shoot, ok. Can you post a minimally-reproducible playbook for me to play with? I test inside Centos7 docker images and things work for me
Also, for sure it's not related to a proxy or anything? Can you connect to all the machines manually?
Since it happens even during the "Gathering Facts" step, I suspect no aspects of the playbooks themselves are responsible. I have two more observations that I think may be helpful:
-vvv
for you. Imagine my surprise when I got a 100% success rate on every single host, reproducible. Is it possible that the problem are not the target hosts, but something on the machine running Ansible itself ?I have an output with -vv
, which still produced the problems, as well as one with -vvv
that runs without problems, but I don't feel comfortable posting so much data in public. Is there a mechanism to provide you with the dumps that is non-public ?
hmmm interesting, I remember that happening to me before as well for something unrelated (where I added a different amount of -v
and it worked).
You can make a private repo of the dump and then invite me to it :)
I have the same issue with centos 7 servers. The problem is not in my playbooks.
If I just launch ansible all -i hosts -m ping -v
:
ansible 2.7.10
config file = /usr/local/ansible/ansible.cfg
configured module search path = [u'/home/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Aug 7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]
I have the last version of mitogen.
My ansible.cfg:
[defaults]
inventory = hosts
debug = dark gray
gathering = smart
fact_caching = jsonfile
fact_caching_connection = ./tmp/
strategy_plugins = ./mitogen/ansible_mitogen/plugins/strategy
strategy = mitogen_linear
callback_plugins=/usr/lib/python2.7/site-packages/ara/plugins/callbacks
action_plugins=/usr/lib/python2.7/site-packages/ara/plugins/actions
[ara]
ARA_DIR=/DATA/ara
ARA_HOST=0.0.0.0
[colors]
verbose = bright blue
[ssh_connection]
scp_if_ssh = True
transfer_method = scp
sftp_batch_mode = False
Hi!
I am not sure if that helps and I am catching the train a bit late.
But I had the same random problem and I resolved it by specifying the ansible_python_interpreter
in my hosts
file:
[all:vars]
ansible_python_interpreter=/usr/bin/python3
[db]
mariadb01 ansible_python_interpreter=/usr/bin/python
mariadb02
[www]
www01
Hope that helps!
Hello,
I try your workaround fauust but it change nothing for me.
I can finally contribute something new, @s1113950! Today, I tried a fresh pull, since I figured maybe something committed in the meantime fixed it, Lo and behold, it worked just fine! However, just as I was about to close this ticket, I had to ctrl-c a running playbook, and right after that, it started happening again.
Hopefully that little detail helps narrow down the scope of the issue.
I have recently switched to a new CentOS 8 VM and it is not happening there. If nothing else, this proves that the problem definitely is not on the target computers, as I am using the same playbooks and inventory. As I can no longer reproduce this issue, I'm leaving it up to maintainers to close it if you want.
Sorry for the somewhat sensational title, but there's really no other way to put it. It simply does not work, and gives no helpful error, nor anything I could debug. On all our targets, I get the following when trying to run Ansible with Mitogen enabled:
fatal: [hostname]: UNREACHABLE! => {"changed": false, "msg": "EOF on stream; last 100 lines received:\nbash: auto_silent: command not found", "unreachable": true}
Google gives nothing for the error. I have tried rebooting.
ansible 2.9.2 and mitogen-0.2.9.
No.
module_utils
loaded?Not that I know of.
I don't know. It is not clear to me how I would go about building it; there seems to be no documentation, and I am not familiar with the ecosystem used. I have tried linking to the contained
ansible_mitogen/plugins/strategy
but that did not fix the issue.No, other than that all my target hosts are CentOS 7 machines. I have included the debug output of mitogen_get_stack that includes the 'auto_silent' string, but I don't understand the significance:
CentOS 7, host and targets.
Python 2.7.5 on all machines