Closed rouge8 closed 6 years ago
Probably something super simple, I haven't been testing against >1 target at all (at least not the Ansible bits). Will setup a reproduction tomorrow. Thanks for the report!
First attempt to replicate this against 23 shared vCPU Google Cloud targets failed, but I can see some weird performance characteristics (sometimes it runs through quickly, other times much more slowly).
Thanks for report, may need to defer this one until I get back to UK, because the only place I have good gcloud quota is eu-west-1, and latency to there from here is basically unuseable
π Let me know if there's any more debug logging I can provide, otherwise I'll be ready to re-test when you think you have a fix. π
I'm using this as a quick (and therapeutic) excuse to expose the existing Docker connection support.. process model won't change, but means I can have 100 targets on my laptop ;)
+1 Facing with the same behavior when running with more than 5-10 hosts at once - hangs forever on the random place.
Ah @rouge8 one last question: the controller machine, how many cores and how many (hardware) threads, and otherwise how loaded is it?
OS X 10.11.6 controller, MBP 2 cores / 4 threads. Screenshots from activity monitor attached. Wasn't doing too much when running Ansible other than the ~300 Chrome tabs open π
Sounds like we have the same laptop :) Thanks
Just to confirm I have reproduced it. Workers all sitting there like dummies, top-level process burning 20% CPU doing.. mysterious things.
The 20% CPU in the parent process is Ansible polling at 1ms intervals on results.
Haven't found it, but interestingly there are 3 post-fork WorkerProcesses for which no corresponding /tmp/mitogen.*.log
file exists. The waiting done by Ansible may be related to results these processes clearly aren't likely to ever generate.
[02:18:51 Eldil!5 ~] pstree -s python
-+= 00001 root /sbin/launchd
|--- 42843 dmw (python.exe)
\-+= 89248 dmw /Applications/iTerm.app/Contents/MacOS/iTerm2
\-+= 01987 dmw bash --login
\-+= 12208 dmw /Users/dmw/src/mitogen/.venv/bin/python2.7 /Users/dmw/src/mitogen/.venv/bin/ansible-playbook -i hosts.docker -l mydeb9-1* issue_131.yml -vvvv
|--- 12219 dmw /Users/dmw/src/mitogen/.venv/bin/python2.7 /Users/dmw/src/mitogen/.venv/bin/ansible-playbook -i hosts.docker -l mydeb9-1* issue_131.yml -vvvv
|--- 12220 dmw docker exec -i mydeb9-1 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGckxoDIYEE3L3oT60JTiWUkQUSchO1fbru8GBKr3t
|--- 12224 dmw /Users/dmw/src/mitogen/.venv/bin/python2.7 /Users/dmw/src/mitogen/.venv/bin/ansible-playbook -i hosts.docker -l mydeb9-1* issue_131.yml -vvvv
|--- 12225 dmw docker exec -i mydeb9-10 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGckxoDIYEE3L3oT60JTiWUkQUSchO1fbru8GBKr3
|--- 12226 dmw /Users/dmw/src/mitogen/.venv/bin/python2.7 /Users/dmw/src/mitogen/.venv/bin/ansible-playbook -i hosts.docker -l mydeb9-1* issue_131.yml -vvvv
|--- 12229 dmw docker exec -i mydeb9-11 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGcgxNDYYGE3L3IT60pTjWpogqkpCdqunXV8aBKr3
|--- 12230 dmw docker exec -i mydeb9-12 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGcgxNDYYGE3L3IT60pTjWpogqkpCdqunXV8aBKr3
|--- 12231 dmw docker exec -i mydeb9-14 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGcgxNDYYGE3L3IT60pTjWpogqkpCdqunXV8aBKr3
|--- 12232 dmw docker exec -i mydeb9-15 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGcgxNDYYGE3L3IT60pTjWpogqkpCdqunXV8aBKr3
|--- 12233 dmw docker exec -i mydeb9-16 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGcgxNDYYGE3L3IT60pTjWpogqkpCdqunXV8aBKr3
|--- 12234 dmw docker exec -i mydeb9-19 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGcgxNDYYGE3L3IT60pTjWpogqkpCdqunXV8aBKr3
\--- 12235 dmw docker exec -i mydeb9-100 /usr/bin/python2.7 -c import codecs,os,sys;_=codecs.decode;exec(_(_("eNpdj8FqwzAQRM/1V/S2EhVGcgxNDYYGE3L3IT60pTjWpogqkpCdqunXV8aBKr
# PID 12208 ThreadID: (MainThread) 140735838786496; <frame object at 0x7fc74282aab0>
File: "/Users/dmw/src/mitogen/.venv/bin/ansible-playbook", line 106, in <module>
exit_code = cli.run()
File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/cli/playbook.py", line 130, in run
results = pbex.run()
File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 154, in run
result = self._tqm.run(play=play)
File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 290, in run
play_return = strategy.run(iterator, play_context)
File: "/Users/dmw/src/mitogen/ansible_mitogen/strategy.py", line 188, in run
return super(StrategyModule, self).run(iterator, play_context)
File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 292, in run
results += self._wait_on_pending_results(iterator)
File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 589, in _wait_on_pending_results
time.sleep(C.DEFAULT_INTERNAL_POLL_INTERVAL)
# PID 12208 ThreadID: (mitogen.master.join_thread_async) 123145427001344; <frame object at 0x7fc74281ba40>
File: "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
File: "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File: "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File: "/Users/dmw/src/mitogen/mitogen/master.py", line 144, in _watch
target_thread.join()
--- then
+++ now
@@ -47,14 +47,8 @@
return super(StrategyModule, self).run(iterator, play_context)
File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 292, in run
results += self._wait_on_pending_results(iterator)
-File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 583, in _wait_on_pending_results
- if self._tqm.has_dead_workers():
-File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 341, in has_dead_workers
- if hasattr(x[0], 'exitcode'):
-File: "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 203, in exitcode
- return self._popen.poll()
-File: "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/forking.py", line 135, in poll
- pid, sts = os.waitpid(self.pid, flag)
+File: "/Users/dmw/src/mitogen/.venv/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 589, in _wait_on_pending_results
+ time.sleep(C.DEFAULT_INTERNAL_POLL_INTERVAL)
# PID 12208 ThreadID: (mitogen.master.join_thread_async) 123145427001344; <frame object at 0x7fc74281ba40>
File: "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 774, in __bootstrap
And what a surprise, the first worker I look at is trying to take a lock. Yay for mixing threading/forking. This one is likely to be fun
The lock in question is in the logging package, and quite possibly it's a lock for Mitogen's own log handler. Funsies.
So basically what happens is you have 2 threads, one of which is active logging a message (such as the Mitogen IO multiplexer thread enjoys), and another (such as the Ansible main thread) calls fork. In the child process (which is a worker process actually implementing an Ansible playbook step), a new Mitogen IO multiplexer is created, and of course it'll start logging again, but in the duplicate copy of the parent's process image the child received, the logging code discovers a lock it wants to take already marked as held -- by a thread in the parent that no longer exists in the child. Tada, deadlock.
So the obvious solution to this is to bring forward multithreaded connect work ( https://github.com/dw/mitogen/issues/144 ) since it already entails pre-forking one connection multiplexer process per CPU long before Ansible starts forking off its WorkerProcesses.
The forking done inside those connection multiplexer processes is generally fine: the only forks occur inside create_child() and tty_create_child(), and they do not log, take or release locks, or do anything other than replace the forked Python image with the ssh/sudo/docker command to be executed.
Perhaps I won't go full-hog on implementing that feature, but I will at least move the Mitogen bits out of the Ansible top-level process, because that's the one that's super fork happy.
Relevant long-lived upstream bug (that still only amounts to a partial solution - threading/forking cannot be safely mixed in the general case): https://bugs.python.org/issue6721
Hi there,
Can you please try out the wip-issue-150 branch on GitHub, or if following the usual docs installation method, grab https://github.com/dw/mitogen/archive/wip-issue-150.zip and report back :)
This code still needs a lot of cleanup work, and note connection establishment is still single-threaded, I found some minimum-work inbetween approach
It works π! It made it through my 45 task play on 19 hosts multiple times. Took 1.5-2 minutes with Mitogen vs. ~2.5 minutes without.
@rouge8 can you post /usr/bin/time output before/after and give me a short description of those playbook steps (I'm guessing not much with_items use?). Does it feel like "naturally it should take this long"? i.e. lots of big apt-gets, talking to far-away web services, etc.
There's one known gotcha right now in that use of the copy module with large files is terribly slow, but there may easily be more
Finally I saw some bizarre slowdowns on some local runs, so I figure there are plenty more perf bugs hiding in the multi-target support (not least, single threaded connection establishment)
The WIP branch is merged to master, and I'm considering this a solved problem. As for multi-target related performance issues, they definitely still exist, but they're outside the scope of this bug :)
Still interested in related timings or weird behaviour from all participants! This is brand new code, so there is a large chance of some weird new emergent behaviour
you post /usr/bin/time output before/after
Of course today's timing is totally different than yesterdays π. In all cases 0 tasks were changed.
Without mitogen, 19 hosts:
Playbook run took 0 days, 0 hours, 3 minutes, 12 seconds
real 3m14.335s
user 5m4.604s
sys 1m40.063s
Without mitogen, 1 host:
Playbook run took 0 days, 0 hours, 0 minutes, 55 seconds
real 0m57.332s
user 0m15.213s
sys 0m4.223s
With mitogen 3e9f01bb7af78059fb85b53b4f7a0059f860a5f5, 19 hosts:
Playbook run took 0 days, 0 hours, 2 minutes, 15 seconds
real 2m16.877s
user 4m6.684s
sys 1m6.321s
With mitogen 3e9f01bb7af78059fb85b53b4f7a0059f860a5f5, 1 host:
Playbook run took 0 days, 0 hours, 0 minutes, 28 seconds
real 0m29.581s
user 0m10.117s
sys 0m2.537s
The playbook is runs 45 tasks, ~30 are custom, and the remaining are https://github.com/threatstack/threatstack-ansible and https://github.com/DataDog/ansible-datadog. No with_items
.
My tasks:
template
module, 1 linehostname
modulelineinfile
, 1 linelineinfile
, 1 linelineinfile
, 1 linelineinfile
, 1 linecopy
, 7 linesfile
, state=absent
lineinfile
, 1 linelineinfile
, 1 lineyum
, one packagefile
, state=absent
copy
, 82 linesservice
, enabled=false
cron
service
lineinfile
, 1 linecommand
: runs rpm --qf
to check installed version of a packagetemplate
, 10 linesset_fact
template
, 30 lines source, 88 lines once templatedservice
get_url
, ~10 linestemplate
, 9 linesyum
, install latestservice
Here's the output from the Datadog/Threatstack tasks:
TASK [threatstack : install threatstack] ****************************************************************
[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use 'import_tasks' for static
inclusions or 'include_tasks' for dynamic inclusions. This feature will be removed in a future release.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is discouraged. The module
documentation details page may explain more about this rationale.. This feature will be removed in a
future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg.
TASK [threatstack.threatstack-ansible : python-apt dependency.] *****************************************
skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : apt-transport-https dependency.] ********************************
skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add Threat Stack apt repository key.] ***************************
skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add Threat Stack apt repository.] *******************************
skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure Threat Stack is installed.] ******************************
skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure ThreatStack repo is installed.] **************************
ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add ThreatStack repo GPG key.] **********************************
ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure Agent is installed.] *************************************
ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create Threat Stack Config Directory] ***************************
ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create ThreatStack Config File] *********************************
ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Cloudsight - setup default] *************************************
ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create file to track extra Cloudsight config] *******************
skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Configure extra cloudsight parameters] **************************
skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Test cloudsight state] ******************************************
ok: [env1-stage-bastion]
PLAY [all] **********************************************************************************************
TASK [datadog : install datadog] ************************************************************************
TASK [Datadog.datadog : Install apt-transport-https] ****************************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Install ubuntu apt-key server] **************************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Install Datadog apt-key] ********************************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog repository is up-to-date] ****************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog repository is up-to-date (agent5)] *******************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure pinned version of Datadog agent is installed] ****************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog agent is installed] **********************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Download new RPM key] ***********************************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Import new RPM key] *************************************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install DataDog yum repo] *******************************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install DataDog yum repo (agent5)] **********************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install pinned datadog-agent package] *******************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install latest datadog-agent package] *******************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Create main Datadog agent configuration file] ***********************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is running] ************************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is not running] ********************************************
skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Create a configuration file for each Datadog check] *****************************
TASK [Datadog.datadog : Create /etc/datadog-agent] ******************************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create main Datadog agant yaml configuration file (beta)] ***********************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create a configuration file for each Datadog check] *****************************
TASK [Datadog.datadog : Create trace agent configuration file] ******************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create process agent configuration file] ****************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is running] ************************************************
ok: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is not running] ********************************************
skipping: [env1-stage-bastion]
TASK [datadog : Remove legacy dd-agent 5.x config] ******************************************************
ok: [env1-stage-bastion]
These timings are all over the place, especially CPU usage, it's not even showing the >.5x typical with a Mirogen run. Thanks, this is immensely useful!
On 17 Mar 2018 23:25, "Andy Freeland" notifications@github.com wrote:
you post /usr/bin/time output before/after
Of course today's timing is totally different than yesterdays π. In all cases 0 tasks were changed.
-
Without mitogen, 19 hosts:
Playbook run took 0 days, 0 hours, 3 minutes, 12 seconds
real 3m14.335s user 5m4.604s sys 1m40.063s
-
Without mitogen, 1 host:
Playbook run took 0 days, 0 hours, 0 minutes, 55 seconds
real 0m57.332s user 0m15.213s sys 0m4.223s
-
With mitogen 3e9f01b https://github.com/dw/mitogen/commit/3e9f01bb7af78059fb85b53b4f7a0059f860a5f5, 19 hosts:
Playbook run took 0 days, 0 hours, 2 minutes, 15 seconds
real 2m16.877s user 4m6.684s sys 1m6.321s
-
With mitogen 3e9f01b https://github.com/dw/mitogen/commit/3e9f01bb7af78059fb85b53b4f7a0059f860a5f5, 1 host:
Playbook run took 0 days, 0 hours, 0 minutes, 28 seconds
real 0m29.581s user 0m10.117s sys 0m2.537s
The playbook is runs 45 tasks, ~30 are custom, and the remaining are https://github.com/threatstack/threatstack-ansible and https://github.com/DataDog/ansible-datadog. No with_items.
My tasks:
- template module, 1 line
- hostname module
- lineinfile, 1 line
- lineinfile, 1 line
- lineinfile, 1 line
- lineinfile, 1 line
- copy, 7 lines
- file, state=absent
- lineinfile, 1 line
- lineinfile, 1 line
- yum, one package
- file, state=absent
- copy, 82 lines
- service, enabled=false
- cron
- service
- lineinfile, 1 line
- command: runs rpm --qf to check installed version of a package
- template, 10 lines
- set_fact
- template, 30 lines source, 88 lines once templated
- service
- get_url, ~10 lines
- template, 9 lines
- yum, install latest
- service
Here's the output from the Datadog/Threatstack tasks:
TASK [threatstack : install threatstack] **** [DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use 'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions. This feature will be removed in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. [DEPRECATION WARNING]: include is kept for backwards compatibility but usage is discouraged. The module documentation details page may explain more about this rationale.. This feature will be removed in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
TASK [threatstack.threatstack-ansible : python-apt dependency.] ***** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : apt-transport-https dependency.] **** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add Threat Stack apt repository key.] *** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add Threat Stack apt repository.] *** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure Threat Stack is installed.] ** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure ThreatStack repo is installed.] ** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add ThreatStack repo GPG key.] ** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure Agent is installed.] ***** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create Threat Stack Config Directory] *** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create ThreatStack Config File] ***** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Cloudsight - setup default] ***** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create file to track extra Cloudsight config] *** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Configure extra cloudsight parameters] ** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Test cloudsight state] ** ok: [env1-stage-bastion]
PLAY [all] **
TASK [datadog : install datadog] ****
TASK [Datadog.datadog : Install apt-transport-https] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Install ubuntu apt-key server] ** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Install Datadog apt-key] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog repository is up-to-date] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog repository is up-to-date (agent5)] *** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure pinned version of Datadog agent is installed] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog agent is installed] ** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Download new RPM key] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Import new RPM key] ***** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install DataDog yum repo] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install DataDog yum repo (agent5)] ** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install pinned datadog-agent package] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install latest datadog-agent package] *** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Create main Datadog agent configuration file] *** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is running] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is not running] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Create a configuration file for each Datadog check] *****
TASK [Datadog.datadog : Create /etc/datadog-agent] ** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create main Datadog agant yaml configuration file (beta)] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create a configuration file for each Datadog check] *****
TASK [Datadog.datadog : Create trace agent configuration file] ** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create process agent configuration file] **** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is running] **** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is not running] **** skipping: [env1-stage-bastion]
TASK [datadog : Remove legacy dd-agent 5.x config] ** ok: [env1-stage-bastion]
β You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/150#issuecomment-373938710, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJC-PFfc4bTiQuAfRBz7WZAd2sug6-ks5tfUqdgaJpZM4StD4S .
CPU use doesnt even grow linearly for this many hosts, its 26x vs a 19x increase in actual load. Something is super broken
On 18 Mar 2018 00:17, "David Wilson" dw@botanicus.net wrote:
These timings are all over the place, especially CPU usage, it's not even showing the >.5x typical with a Mirogen run. Thanks, this is immensely useful!
On 17 Mar 2018 23:25, "Andy Freeland" notifications@github.com wrote:
you post /usr/bin/time output before/after
Of course today's timing is totally different than yesterdays π. In all cases 0 tasks were changed.
-
Without mitogen, 19 hosts:
Playbook run took 0 days, 0 hours, 3 minutes, 12 seconds
real 3m14.335s user 5m4.604s sys 1m40.063s
-
Without mitogen, 1 host:
Playbook run took 0 days, 0 hours, 0 minutes, 55 seconds
real 0m57.332s user 0m15.213s sys 0m4.223s
-
With mitogen 3e9f01b https://github.com/dw/mitogen/commit/3e9f01bb7af78059fb85b53b4f7a0059f860a5f5, 19 hosts:
Playbook run took 0 days, 0 hours, 2 minutes, 15 seconds
real 2m16.877s user 4m6.684s sys 1m6.321s
-
With mitogen 3e9f01b https://github.com/dw/mitogen/commit/3e9f01bb7af78059fb85b53b4f7a0059f860a5f5, 1 host:
Playbook run took 0 days, 0 hours, 0 minutes, 28 seconds
real 0m29.581s user 0m10.117s sys 0m2.537s
The playbook is runs 45 tasks, ~30 are custom, and the remaining are https://github.com/threatstack/threatstack-ansible and https://github.com/DataDog/ansible-datadog. No with_items.
My tasks:
- template module, 1 line
- hostname module
- lineinfile, 1 line
- lineinfile, 1 line
- lineinfile, 1 line
- lineinfile, 1 line
- copy, 7 lines
- file, state=absent
- lineinfile, 1 line
- lineinfile, 1 line
- yum, one package
- file, state=absent
- copy, 82 lines
- service, enabled=false
- cron
- service
- lineinfile, 1 line
- command: runs rpm --qf to check installed version of a package
- template, 10 lines
- set_fact
- template, 30 lines source, 88 lines once templated
- service
- get_url, ~10 lines
- template, 9 lines
- yum, install latest
- service
Here's the output from the Datadog/Threatstack tasks:
TASK [threatstack : install threatstack] **** [DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use 'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions. This feature will be removed in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. [DEPRECATION WARNING]: include is kept for backwards compatibility but usage is discouraged. The module documentation details page may explain more about this rationale.. This feature will be removed in a future release. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
TASK [threatstack.threatstack-ansible : python-apt dependency.] ***** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : apt-transport-https dependency.] **** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add Threat Stack apt repository key.] *** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add Threat Stack apt repository.] *** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure Threat Stack is installed.] ** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure ThreatStack repo is installed.] ** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Add ThreatStack repo GPG key.] ** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Ensure Agent is installed.] ***** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create Threat Stack Config Directory] *** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create ThreatStack Config File] ***** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Cloudsight - setup default] ***** ok: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Create file to track extra Cloudsight config] *** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Configure extra cloudsight parameters] ** skipping: [env1-stage-bastion]
TASK [threatstack.threatstack-ansible : Test cloudsight state] ** ok: [env1-stage-bastion]
PLAY [all] **
TASK [datadog : install datadog] ****
TASK [Datadog.datadog : Install apt-transport-https] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Install ubuntu apt-key server] ** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Install Datadog apt-key] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog repository is up-to-date] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog repository is up-to-date (agent5)] *** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure pinned version of Datadog agent is installed] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure Datadog agent is installed] ** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Download new RPM key] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Import new RPM key] ***** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install DataDog yum repo] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install DataDog yum repo (agent5)] ** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install pinned datadog-agent package] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Install latest datadog-agent package] *** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Create main Datadog agent configuration file] *** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is running] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is not running] **** skipping: [env1-stage-bastion]
TASK [Datadog.datadog : Create a configuration file for each Datadog check] *****
TASK [Datadog.datadog : Create /etc/datadog-agent] ** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create main Datadog agant yaml configuration file (beta)] *** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create a configuration file for each Datadog check] *****
TASK [Datadog.datadog : Create trace agent configuration file] ** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Create process agent configuration file] **** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is running] **** ok: [env1-stage-bastion]
TASK [Datadog.datadog : Ensure datadog-agent is not running] **** skipping: [env1-stage-bastion]
TASK [datadog : Remove legacy dd-agent 5.x config] ** ok: [env1-stage-bastion]
β You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/150#issuecomment-373938710, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJC-PFfc4bTiQuAfRBz7WZAd2sug6-ks5tfUqdgaJpZM4StD4S .
If you do another pull, I've landed the first chunk of the multithreaded connect work. Connections are now established up to 16 at a time, and this can be increased with MITOGEN_POOL_SIZE environment variable (although >16 isn't likely to have a tremendous effect)
This might help performance for you, but (as usual) I have no good target to test with in this location.
Running the same play as yesterday with 3e9f01bb7af78059fb85b53b4f7a0059f860a5f5:
Playbook run took 0 days, 0 hours, 2 minutes, 9 seconds
real 2m11.507s
user 4m5.808s
sys 1m4.400s
Running with bcf5e3b9028a48f3bbaeb55676d1ff5dbb243d3f:
Playbook run took 0 days, 0 hours, 1 minutes, 58 seconds
real 2m0.281s
user 4m5.931s
sys 1m7.796s
No big differences yet, but moving in the right direction
Using mitogen with Ansible and 19 hosts consistently hangs, though not on the same host, or the same task when I rerun. I can reproduce with my own playbooks (as early as the first task) as well as https://github.com/dw/mitogen/blob/44fc8452b686fb742447cf2716f243e9d036aa72/examples/playbook/issue_131.yml. I've reproduced the hang with as few as 10 hosts. Logs at the very bottom.
When it hangs, the output is something like below, where 18/19 hosts complete the task (output from one of my playbooks, but similar on
issue_131.yml
):Versions
Controller: OS X 10.11.6, Python from Homebrew. Targets: Amazon Linux (RHEL-ish + backports + random Amazon things) Mitogen: 44fc8452b686fb742447cf2716f243e9d036aa72
Ansible config
Logs
10 hosts, issue_131.yml