mitogen-hq / mitogen

Distributed self-replicating programs in Python
https://mitogen.networkgenomics.com/
BSD 3-Clause "New" or "Revised" License
2.32k stars 197 forks source link

"Pending work still existed" error when using running tasks on localhost #704

Open rdghickman opened 4 years ago

rdghickman commented 4 years ago

Created a very simple ansible playbook such as the following:

- name: get timestamp
  shell: "date +%Y-%m-%dT%H-%M-%S"
  register: local_timestamp

Used by a simple playbook:

---
- hosts: localhost
  roles:
    - test

Running it with strategy mitogen_linear results in the execution completing but the following error appearing:

ERROR! [mux 16518] 14:47:49.436676 E mitogen.[local.16555]: Broker(1550): pending work still existed 3 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.

This seems to happen if:

Verbose output:


PLAY RECAP *************************************************************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

[mux  16518] 14:47:46.406836 D mitogen: Waker(fd=6/8): disconnecting
[mux  16518] 14:47:46.407476 D mitogen.service: thread mitogen.Pool.b828.8 exiting gracefully
[mux  16518] 14:47:46.407805 D mitogen.service: thread mitogen.Pool.b828.10 exiting gracefully
[mux  16518] 14:47:46.407978 D mitogen.service: thread mitogen.Pool.b828.9 exiting gracefully
[mux  16518] 14:47:46.408162 D mitogen.service: thread mitogen.Pool.b828.11 exiting gracefully
[mux  16518] 14:47:46.408670 D mitogen.service: thread mitogen.Pool.b828.15 exiting gracefully
[mux  16518] 14:47:46.408922 D mitogen.service: thread mitogen.Pool.b828.12 exiting gracefully
[mux  16518] 14:47:46.409110 D mitogen.service: thread mitogen.Pool.b828.13 exiting gracefully
[mux  16518] 14:47:46.409252 D mitogen.service: thread mitogen.Pool.b828.14 exiting gracefully
[mux  16518] 14:47:46.409422 D mitogen.service: thread mitogen.Pool.b828.16 exiting gracefully
[mux  16518] 14:47:46.409626 D mitogen.service: thread mitogen.Pool.b828.19 exiting gracefully
[mux  16518] 14:47:46.409770 D mitogen.service: thread mitogen.Pool.b828.17 exiting gracefully
[mux  16518] 14:47:46.409928 D mitogen.service: thread mitogen.Pool.b828.22 exiting gracefully
[mux  16518] 14:47:46.410229 D mitogen.service: thread mitogen.Pool.b828.18 exiting gracefully
[mux  16518] 14:47:46.410755 D mitogen.parent: MitogenProtocol(local.16555): requesting child shutdown
[mux  16518] 14:47:46.410985 D mitogen.service: thread mitogen.Pool.b828.20 exiting gracefully
[mux  16518] 14:47:46.411198 D mitogen.service: thread mitogen.Pool.b828.21 exiting gracefully
[mux  16518] 14:47:46.411462 D mitogen.service: thread mitogen.Pool.b828.24 exiting gracefully
[mux  16518] 14:47:46.411643 D mitogen.service: thread mitogen.Pool.b828.23 exiting gracefully
[mux  16518] 14:47:46.411786 D mitogen.service: thread mitogen.Pool.b828.25 exiting gracefully
[mux  16518] 14:47:46.412009 D mitogen.service: thread mitogen.Pool.b828.29 exiting gracefully
[mux  16518] 14:47:46.412405 D mitogen.service: thread mitogen.Pool.b828.28 exiting gracefully
[mux  16518] 14:47:46.412568 D mitogen.service: thread mitogen.Pool.b828.26 exiting gracefully
[mux  16518] 14:47:46.412722 D mitogen.service: thread mitogen.Pool.b828.30 exiting gracefully
[mux  16518] 14:47:46.412877 D mitogen.service: thread mitogen.Pool.b828.1 exiting gracefully
[mux  16518] 14:47:46.413026 D mitogen.service: thread mitogen.Pool.b828.27 exiting gracefully
[mux  16518] 14:47:46.413146 D mitogen.service: thread mitogen.Pool.b828.0 exiting gracefully
[mux  16518] 14:47:46.413291 D mitogen.service: thread mitogen.Pool.b828.2 exiting gracefully
[mux  16518] 14:47:46.413460 D mitogen.service: thread mitogen.Pool.b828.31 exiting gracefully
[mux  16518] 14:47:46.413874 D mitogen.service: thread mitogen.Pool.b828.3 exiting gracefully
[mux  16518] 14:47:46.414013 D mitogen.service: thread mitogen.Pool.b828.6 exiting gracefully
[mux  16518] 14:47:46.414145 D mitogen.service: thread mitogen.Pool.b828.4 exiting gracefully
[mux  16518] 14:47:46.414262 D mitogen.service: thread mitogen.Pool.b828.7 exiting gracefully
[mux  16518] 14:47:46.414426 D mitogen.service: thread mitogen.Pool.b828.5 exiting gracefully
[mux  16518] 14:47:46.431696 D mitogen.[local.16555]: shutdown request from context 0
[mux  16518] 14:47:46.461130 D mitogen.[local.16555]: Broker(1550): shutting down
[mux  16518] 14:47:46.462294 D mitogen.[local.16555]: Waker(fd=3/4): shutting down
[mux  16518] 14:47:46.463429 D mitogen.[local.16555]: Waker(fd=3/4): disconnecting
[mux  16518] 14:47:46.464563 D mitogen.[local.16555]: IoLoggerProtocol(stdout): shutting down
[mux  16518] 14:47:46.465694 D mitogen.[local.16555]: IoLoggerProtocol(stderr): shutting down
[mux  16518] 14:47:46.468316 D mitogen.[local.16555]: MitogenProtocol(parent): shutting down
[mux  16518] 14:47:46.469801 D mitogen.parent.[local.16555]: MitogenProtocol(fork.16558): requesting child shutdown
[mux  16518] 14:47:46.471448 D mitogen.[fork.16558]: shutdown request from context 2
[mux  16518] 14:47:46.472932 D mitogen.[fork.16558]: Broker(3f98): shutting down
[mux  16518] 14:47:46.474087 D mitogen.[fork.16558]: Waker(fd=3/4): shutting down
[mux  16518] 14:47:46.475202 D mitogen.[fork.16558]: Waker(fd=3/4): disconnecting
[mux  16518] 14:47:46.476380 D mitogen.[fork.16558]: IoLoggerProtocol(stdout): shutting down
[mux  16518] 14:47:46.478675 D mitogen.[fork.16558]: IoLoggerProtocol(stderr): shutting down
[mux  16518] 14:47:46.480066 D mitogen.[fork.16558]: MitogenProtocol(parent): shutting down
[mux  16518] 14:47:46.481195 D mitogen.[local.16555]: <Side of fork.16558 fd 15>: empty read, disconnecting
[mux  16518] 14:47:46.482331 D mitogen.parent.[local.16555]: Process fork.16558 pid 16558: exited with return code 0
[mux  16518] 14:47:46.484022 D mitogen.[local.16555]: Context(3, 'fork.16558'): disconnecting
[mux  16518] 14:47:46.485168 D mitogen.route_monitor.[local.16555]: stream fork.16558 is gone; propagating DEL_ROUTE for {3}
[mux  16518] 14:47:46.489319 D mitogen.parent.[local.16555]: Router(Broker(1550)): deleting route to 3
[mux  16518] 14:47:46.490871 D mitogen.route_monitor: firing local disconnect signal for Context(3, 'fork.16558')
[mux  16518] 14:47:46.491999 D mitogen.route_monitor: deleting route to 3 via <Stream local.16555 #3cc0>
[mux  16518] 14:47:46.493110 D mitogen.parent: Router(Broker(b390)): deleting route to 3
[mux  16518] 14:47:46.494236 D mitogen.[local.16555]: MitogenProtocol(fork.16558): disconnecting
ERROR! [mux  16518] 14:47:49.436676 E mitogen.[local.16555]: Broker(1550): pending work still existed 3 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
[mux  16518] 14:47:49.440681 D mitogen.[local.16555]: Broker(1550): force disconnecting <Side of stdout fd 8>
[mux  16518] 14:47:49.442687 D mitogen.[local.16555]: IoLoggerProtocol(stdout): disconnecting
[mux  16518] 14:47:49.444335 D mitogen.[local.16555]: Broker(1550): force disconnecting <Side of stderr fd 10>
[mux  16518] 14:47:49.445760 D mitogen.[local.16555]: IoLoggerProtocol(stderr): disconnecting
[mux  16518] 14:47:49.446977 D mitogen.[local.16555]: Broker(1550): force disconnecting <Side of parent fd 5>
[mux  16518] 14:47:49.449356 D mitogen.[local.16555]: parent stream is gone, dying.
[mux  16518] 14:47:49.450620 D mitogen.[local.16555]: Broker(1550): shutting down
[mux  16518] 14:47:49.451971 D mitogen: <Side of local.16555 fd 76>: empty read, disconnecting
[mux  16518] 14:47:49.453600 D mitogen.parent: PopenProcess local.16555 pid 16555: exited due to signal 15 (SIGTERM)
[mux  16518] 14:47:49.454935 I ansible_mitogen.services: ContextService(): Forgetting Context(2, 'local.16555') due to stream disconnect
[mux  16518] 14:47:49.456486 D mitogen.route_monitor: stream local.16555 is gone; propagating DEL_ROUTE for {2}
[mux  16518] 14:47:49.459012 D mitogen.parent: Router(Broker(b390)): deleting route to 2
[mux  16518] 14:47:49.461478 I ansible_mitogen.services: ContextService(): Forgetting Context(2, 'local.16555') due to stream disconnect
[mux  16518] 14:47:49.463337 D ansible_mitogen.services: ContextService(): attempt to forget unknown Context(2, 'local.16555')
[mux  16518] 14:47:49.465289 D mitogen: MitogenProtocol(local.16555): disconnecting
[mux  16518] 14:47:49.466627 D mitogen: Router(Broker(b390)): stats: 1 module requests in 400 ms, 120 sent (342 ms minify time), 0 negative responses. Sent 630.0 kb total, 5.2 kb avg.
[mux  16518] 14:47:49.470593 D mitogen.service: FileService().on_shutdown()
[top  16512] 14:47:49.477082 D ansible_mitogen.process: multiplexer 0 PID 16518 exited with return code 0```
s1113950 commented 4 years ago

Can you try ansible 2.8.8? Ansible 2.9+ isn't fully supported by Mitogen yet

rdghickman commented 4 years ago

Interestingly I tried it on an Ubuntu 18.0.4.4 system (Ansible 2.9.6) and it was fine. I will try and downgrade Ansible on the WSL Debian box though.

rdghickman commented 4 years ago

Also happens with Ansible 2.7.7 on the WSL Debian box. I am having to conclude that either Python or Mitogen doesn't like something about WSL, but I'm not sure what.

s1113950 commented 4 years ago

We use Azure Devops for part of our testing, and Travis for the other part. From what I can tell it's not possible to test on a WSL setup in Azure Devops https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops 🤔 . Unfortunately I don't have a Windows system to test on either :/ If it were possible to get something like a WSL2 docker image that could work lol but not sure that's even possible. Hopefully another Mitogen contributor has this system and can take a look at this ticket 🤞

rdghickman commented 4 years ago

I do wonder if something like this is the cause, if it's waiting on the termination of a child process that ends up never happening: https://github.com/microsoft/WSL/issues/3766

anxstj commented 3 years ago

Occasionally I see the same error in our CI environment, but I wasn't able to reproduce it.

...
    PLAY RECAP *********************************************************************
    linux-debian-10-ci         : ok=501  changed=179  unreachable=0    failed=0    skipped=256  rescued=0    ignored=0
    linux-debian-10-gl         : ok=492  changed=181  unreachable=0    failed=0    skipped=244  rescued=0    ignored=0
    linux-debian-9-ci          : ok=494  changed=178  unreachable=0    failed=0    skipped=257  rescued=0    ignored=0
ERROR! [mux  774] 10:01:43.881377 E mitogen.[docker.linux-debian-10-gl]: Broker(4e50): pending work still existed 3 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
ERROR! [mux  774] 10:01:46.187997 E mitogen: Broker(65f8): pending work still existed 5 seconds after shutdown began. This may be due to a timer that is yet to expire, or a child connection that did not fully shut down.
 --> Scenario: 'test'
 --> Action: 'idempotence'
ERROR: 
PLAY [all] *********************************************************************

TASK [Gathering Facts] *********************************************************
 fatal: [linux-debian-10-ci]: UNREACHABLE! => {"changed": false, "msg": "Connection timed out.", "unreachable": true}
 fatal: [linux-debian-10-gl]: UNREACHABLE! => {"changed": false, "msg": "Connection timed out.", "unreachable": true}
 ok: [linux-debian-9-ci]
...

It's a Debian 10 docker container with Ansible 2.9.17 and mitogen 0.2.9.