mitogen-hq / mitogen

Distributed self-replicating programs in Python
https://mitogen.networkgenomics.com/
BSD 3-Clause "New" or "Revised" License
2.32k stars 197 forks source link

wait_for_connection TimeoutError on RC 3 #822

Open dsgnr opened 3 years ago

dsgnr commented 3 years ago

It appears wait_for_connection on RC 3.0 does not work. I have also tested this with the master branch, and the same issue occurs.

- name: Wait for server
  wait_for_connection:
    delay: 10
    timeout: 3600
    sleep: 10
  become: false

Responds with the following after a short amount of time (around 5 seconds):

ERROR! [mux  42316] 08:10:56.906799 E mitogen.service: Pool(5c10, size=32, th='mitogen.Pool.5c10.21'): while invoking 'reset' of 'ansible_mitogen.services.ContextService'
Traceback (most recent call last):
  File "/Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/mitogen/service.py", line 627, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
  File "/Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/mitogen/service.py", line 313, in invoke
    response = self._invoke(method_name, kwargs, msg)
  File "/Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/mitogen/service.py", line 299, in _invoke
    ret = method(**kwargs)
  File "/Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/ansible_mitogen/services.py", line 198, in reset
    l.get(timeout=30.0)
  File "/Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/mitogen/core.py", line 2643, in get
    return self._get_sleep(poller, timeout, block, rsock, wsock, cookie)
  File "/Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/mitogen/core.py", line 2679, in _get_sleep
    raise e
mitogen.core.TimeoutError: None
ansible 2.10.5
  config file = /Users/dsgnr/.ansible.cfg
  configured module search path = ['/Users/dsgnr/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/ansible
  executable location = /Users/dsgnr/.virtualenvs/ansible/bin/ansible
  python version = 3.7.9 (v3.7.9:13c94747c7, Aug 15 2020, 01:31:08) [Clang 6.0 (clang-600.0.57)]
✗ ansible-config dump --only-changed |tee
DEFAULT_CALLBACK_WHITELIST(/Users/dsgnr/.ansible.cfg) = ['profile_tasks']
DEFAULT_HOST_LIST(/Users/dsgnr/.ansible.cfg) = ['/Users/dsgnr/ansible_hosts']
DEFAULT_STRATEGY(/Users/dsgnr/.ansible.cfg) = mitogen_linear
DEFAULT_STRATEGY_PLUGIN_PATH(/Users/dsgnr/.ansible.cfg) = ['/Users/dsgnr/.virtualenvs/ansible/lib/python3.7/site-packages/ansible_mitogen/plugins/strategy']
HOST_KEY_CHECKING(/Users/dsgnr/.ansible.cfg) = False
RETRY_FILES_ENABLED(/Users/dsgnr/.ansible.cfg) = False
JiffsMaverick commented 2 years ago

I have same problem.

Ansible: 2.10.16, 2.11.7, 2.12.1 Ansible Host OS: CentOS 7.6 / macOS 11.6.2 Remote Host OS: CentOS 7.6 / Oracle Linux 8.4 Mitogen: 0.3.2

Test playbook:

- hosts: all
  name: test
  become: false
  gather_facts: false

  tasks:
    - name: Wait for system to become reachable over ssh
      ansible.builtin.wait_for_connection:
        connect_timeout: 3
        delay: 0
        sleep: 1
        timeout: 12

Errors:

fatal: [my_remote_host]: FAILED! => {"changed": false, "elapsed": 12, "msg": "timed out waiting for ping module test: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early."}

Errors with -vvvvv:

[task 97384] 19:46:04.899147 D ansible_mitogen.planner: <class 'ansible_mitogen.planner.BinaryPlanner'> rejected 'ansible.legacy.ping'
[task 97384] 19:46:04.899681 D ansible_mitogen.planner: <class 'ansible_mitogen.planner.NewStylePlanner'> accepted 'ansible.legacy.ping' (filename '/path_to_ansible/ansible212/lib/python3.9/site-packages/ansible/modules/ping.py')
[task 97384] 19:46:04.900522 D mitogen.io: Router(Broker(2e50)).add_handler(<bound method Receiver._on_receive of Receiver(Router(Broker(2e50)), 110)>, 110, True)
[task 97384] 19:46:04.901273 D mitogen.io: Latch(0x10fcfe790, size=0, t='mitogen.Pool.1d00.0').get(timeout=None, block=True)
[task 97384] 19:46:04.902573 D mitogen.io: Latch(0x10fcfe790, size=0, t='mitogen.Pool.1d00.1').get(timeout=None, block=True)
[task 97384] 19:46:04.902942 D mitogen.service: Pool(1d00, size=2, th='MainThread'): initialized
[task 97384] 19:46:04.903925 D mitogen.io: Latch(0x10fcfe790, size=0, t='mitogen.Pool.1d00.0')._get_sleep(timeout=None, block=True, fd=44/45)
[task 97384] 19:46:04.908435 D mitogen.io: PollPoller.poll(None)
[task 97384] 19:46:04.909139 D mitogen.io: Latch(0x10fcfe790, size=0, t='mitogen.Pool.1d00.1')._get_sleep(timeout=None, block=True, fd=47/48)
[task 97384] 19:46:04.909829 D mitogen.io: PollPoller.poll(None)
[task 97384] 19:46:05.025414 D mitogen.service: caching small file /path_to_ansible/ansible212/lib/python3.9/site-packages/ansible/modules/ping.py
wait_for_connection: attempting ping module test
[task 97384] 19:46:06.026735 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
wait_for_connection: attempting ping module test
[task 97384] 19:46:07.028123 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
wait_for_connection: attempting ping module test
[task 97384] 19:46:08.029647 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
wait_for_connection: attempting ping module test
[task 97384] 19:46:09.031235 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
wait_for_connection: attempting ping module test
[task 97384] 19:46:10.032965 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
wait_for_connection: attempting ping module test
[task 97384] 19:46:11.035038 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
wait_for_connection: attempting ping module test
[task 97384] 19:46:12.037328 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
[task 97384] 19:46:13.039553 D ansible_mitogen.mixins: _remove_tmp_path(None)
[task 97384] 19:46:13.040407 D ansible_mitogen.mixins: _remove_tmp_path(None)
[task 97384] 19:46:13.041674 D mitogen.parent: starting no-reply function call to 'ssh.xx.xx.xx.xx': mitogen.core.Dispatcher.forget_chain('my_ansible_host_id')
fatal: [my_remote_host]: FAILED! => {
    "changed": false,
    "elapsed": 12,
    "msg": "timed out waiting for ping module test: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early."
}
JiffsMaverick commented 2 years ago

Oh no, I just realized that the problem is only with FQCN ansible.builtin.wait_for_connection. Mitogen works well with 'old' wait_for_connection.

glbyers commented 2 months ago

FWIW, this simple change to mixins.py resolves this;

--- mitogen-0.3.7.a/ansible_mitogen/mixins.py   2024-04-09 08:49:41.000000000 +1000
+++ mitogen-0.3.7/ansible_mitogen/mixins.py     2024-07-19 10:36:14.696669850 +1000
@@ -379,7 +379,10 @@
         # wait_for_connection, the `ping` test from Ansible won't pass because we lost connection
         # clearing out context forces a reconnect
         # see https://github.com/dw/mitogen/issues/655 and Ansible's `wait_for_connection` module for more info
-        if module_name == 'ansible.legacy.ping' and type(self).__name__ == 'wait_for_connection':
+        if module_name == 'ansible.legacy.ping' and type(self).__name__ in [
+                'wait_for_connection',
+                'ansible.legacy.wait_for_connection',
+                'ansible.builtin.wait_for_connection']:
             self._connection.context = None

         self._connection._connect()