mitogen-hq / mitogen

Distributed self-replicating programs in Python
https://mitogen.networkgenomics.com/
BSD 3-Clause "New" or "Revised" License
2.32k stars 197 forks source link

[0.2.4] Ansible with sudo results in channel disconnect #481

Closed berenddeschouwer closed 5 years ago

berenddeschouwer commented 5 years ago

Ansible using mitogen_linear results in a cryptic ansible error

"Channel was disconnected while connection attempt was in progress; this may be caused by an abnormal Ansible exit, or due to an unreliable target."

This only happens with --become. This happens after MITO000 is received. So login and initial Python run happens fine.

This started with commit 802de6a8d585fbc24434a993aa0e2bba02920ce1 (using git bisect) for issue #406.

Log ```` ansible 2.7.5 config file = /home/berend/.ansible.cfg configured module search path = ['/home/berend/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.7/site-packages/ansible executable location = /usr/bin/ansible python version = 3.7.2 (default, Jan 3 2019, 09:14:01) [GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] Using /home/berend/.ansible.cfg as config file /home/berend/Source/ansible-conf/inventory/acme did not meet host_list requirements, check plugin documentation if this is unexpected /home/berend/Source/ansible-conf/inventory/acme did not meet script requirements, check plugin documentation if this is unexpected Parsed /home/berend/Source/ansible-conf/inventory/acme inventory source with ini plugin /home/berend/Source/ansible-conf/inventory/acme-stores did not meet host_list requirements, check plugin documentation if this is unexpected /home/berend/Source/ansible-conf/inventory/acme-stores did not meet script requirements, check plugin documentation if this is unexpected Parsed /home/berend/Source/ansible-conf/inventory/acme-stores inventory source with ini plugin ... inventory snipped ... Parsed /home/berend/Source/ansible-conf/inventory/test-stores inventory source with ini plugin [pid 25347] 11:38:35.537271 D mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='MainThread'): initialized [pid 25347] 11:38:35.538238 D ansible_mitogen.process: Service pool configured: size=16 META: ran handlers [pid 25366] 11:38:35.608042 D mitogen: unix.connect(path='/tmp/mitogen_unix_zbe0aump') [pid 25366] 11:38:35.608895 D mitogen: unix.connect(): local ID is 1, remote is 0 [pid 25347] 11:38:35.611661 D mitogen: mitogen.ssh.Stream('default').connect() [pid 25347] 11:38:35.762189 D mitogen: hybrid_tty_create_child() pid=25369 stdio=59, tty=13, cmd: /home/berend/Source/ansible-conf/bin/timedssh -o "LogLevel ERROR" -l ansible -o "Compression yes" -o "ServerAliveInterval 15" -o "ServerAliveCountMax 3" -o "StrictHostKeyChecking yes" -C -o ControlMaster=auto -o ControlPersist=30m -o ConnectTimeout=60 -o PasswordAuthentication=no example.com /usr/bin/python2.6 -c "'import codecs,os,sys;_=codecs.decode;exec(_(_(\"eNqFkVFLwzAUhZ/XX9G3JCzb0k0RCwVlD+KDCEXcgw5pm1sX7JKQtovbr/euE9bOB9/uxz0353CS8lVi6qlVFigLHPc9UmWIUBr3RVkcjHCWrZ1TwSMh2JlT3ieH2+jERWVqoGkfXB9WffAIaFjv0b7KGnTdhkkSEpk5rzQJMy27JXxD0TZZXkG3nrW1m+VKz+y+2RhNMOfoQjZOusMduFoZ/RYv1p0t6J1yyOQ+fXgVZJ0Mz04axIoOF3yIY0K3qjGfoOMcHGh5VyvYwURCXWxM68FNCjM5ZPH8enF1wwgL8F3vVAM04uTp8eVZCPGuCeYpjMTaWbBMPuixeGksaKybuJywqYNM0mgxvxWMk4Oy+FJpk7NuxYnPyfEvSvtrsOzmU78Xav+f+m/KaJDyB1uNs44=\".encode(),\"base64\"),\"zip\"))'" [pid 25347] 11:38:35.762929 D mitogen: mitogen.ssh.Stream('local.25369').connect(): stdin=59, stdout=62, diag=13 [pid 25347] 11:38:36.560717 D mitogen: mitogen.ssh.Stream('local.25369'): received b'MITO000\n' [pid 25347] 11:38:36.561234 D mitogen: mitogen.ssh.Stream('local.25369')._ec0_received() [pid 25347] 11:38:36.649091 D mitogen: CallChain(Context(2, 'ssh.example.com')).call_async(): ansible_mitogen.target.init_child(log_level=10, candidate_temp_dirs=['~/.ansible/tmp', '/var/tmp', '/tmp']) [pid 25347] 11:38:36.656095 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/__init__.py', 'ansible') -> ['cli', 'compat', 'config', 'constants', 'errors', 'executor', 'galaxy', 'inventory', 'module_utils', 'modules', 'parsing', 'playbook', 'plugins', 'release', 'template', 'utils', 'vars'] [pid 25347] 11:38:36.658741 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.release') [pid 25347] 11:38:36.659362 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible') [pid 25347] 11:38:36.661716 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/module_utils/__init__.py', 'ansible.module_utils') -> ['_text', 'acme', 'ansible_release', 'ansible_tower', 'api', 'aws', 'azure_rm_common', 'azure_rm_common_rest', 'basic', 'cloud', 'cloudscale', 'cloudstack', 'common', 'compat', 'connection', 'crypto', 'database', 'digital_ocean', 'dimensiondata', 'docker_common', 'ec2', 'exoscale', 'f5_utils', 'facts', 'firewalld', 'gcdns', 'gce', 'gcp', 'gcp_utils', 'gitlab', 'heroku', 'ibm_sa_utils', 'infinibox', 'influxdb', 'ipa', 'ismount', 'json_utils', 'k8s', 'keycloak', 'known_hosts', 'ldap', 'lxd', 'manageiq', 'memset', 'mysql', 'net_tools', 'netapp', 'netapp_elementsw_module', 'netapp_module', 'network', 'oneandone', 'oneview', 'online', 'opennebula', 'openstack', 'ovirt', 'parsing', 'postgres', 'powershell', 'pure', 'pycompat24', 'rax', 'redfish_utils', 'redhat', 'remote_management', 'scaleway', 'service', 'six', 'splitter', 'storage', 'univention_umc', 'urls', 'vca', 'vmware', 'vmware_rest_client', 'vultr', 'yumdnf'] [pid 25347] 11:38:36.661983 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils') [pid 25347] 11:38:36.722858 D mitogen: _get_module_via_sys_modules('select') -> [pid 25347] 11:38:36.723121 D mitogen: get_module_source('select'): cannot find source [pid 25347] 11:38:36.734669 D mitogen: _get_module_via_sys_modules('grp') -> [pid 25347] 11:38:36.734904 D mitogen: get_module_source('grp'): cannot find source [pid 25347] 11:38:36.756516 D mitogen: _get_module_via_sys_modules('syslog') -> [pid 25347] 11:38:36.756809 D mitogen: get_module_source('syslog'): cannot find source [pid 25347] 11:38:36.787266 D mitogen: ModuleFinder()._get_module_via_pkgutil('__main__'): Error while finding loader for '__main__' (: __main__.__spec__ is None) [pid 25347] 11:38:36.787467 D mitogen: _get_module_via_sys_modules('__main__') -> [pid 25347] 11:38:36.801931 D mitogen: _get_module_via_sys_modules('_datetime') -> [pid 25347] 11:38:36.802178 D mitogen: get_module_source('_datetime'): cannot find source [pid 25347] 11:38:36.802387 D mitogen: _get_module_via_sys_modules('math') -> [pid 25347] 11:38:36.802552 D mitogen: get_module_source('math'): cannot find source [pid 25347] 11:38:36.806404 D mitogen: _get_module_via_sys_modules('fcntl') -> [pid 25347] 11:38:36.806616 D mitogen: get_module_source('fcntl'): cannot find source [pid 25347] 11:38:36.821429 D mitogen: ModuleFinder(): loading 'os.path' using <_frozen_importlib_external.SourceFileLoader object at 0x7f3a7cc9bcc0> failed: loader for posixpath cannot handle os.path [pid 25347] 11:38:36.821665 D mitogen: _get_module_via_sys_modules('os.path') -> [pid 25347] 11:38:36.830056 D mitogen: _get_module_via_sys_modules('_selinux') -> [pid 25347] 11:38:36.830271 D mitogen: get_module_source('_selinux'): cannot find source [pid 25347] 11:38:36.848111 D mitogen: _get_module_via_sys_modules('_posixsubprocess') -> [pid 25347] 11:38:36.848367 D mitogen: get_module_source('_posixsubprocess'): cannot find source [pid 25347] 11:38:36.862347 D mitogen: _get_module_via_sys_modules('systemd._journal') -> [pid 25347] 11:38:36.862591 D mitogen: get_module_source('systemd._journal'): cannot find source [pid 25347] 11:38:36.862795 D mitogen: _get_module_via_sys_modules('systemd._reader') -> [pid 25347] 11:38:36.862977 D mitogen: get_module_source('systemd._reader'): cannot find source [pid 25347] 11:38:36.873949 D mitogen: _get_module_via_sys_modules('zlib') -> [pid 25347] 11:38:36.874181 D mitogen: get_module_source('zlib'): cannot find source [pid 25347] 11:38:36.894511 D mitogen: _get_module_via_sys_modules('_hashlib') -> [pid 25347] 11:38:36.894737 D mitogen: get_module_source('_hashlib'): cannot find source [pid 25347] 11:38:36.895574 D mitogen: _get_module_via_sys_modules('_json') -> [pid 25347] 11:38:36.895761 D mitogen: get_module_source('_json'): cannot find source [pid 25347] 11:38:36.895953 D mitogen: _get_module_via_sys_modules('_struct') -> [pid 25347] 11:38:36.896107 D mitogen: get_module_source('_struct'): cannot find source [pid 25347] 11:38:36.899036 D mitogen: _get_module_via_sys_modules('_heapq') -> [pid 25347] 11:38:36.899220 D mitogen: get_module_source('_heapq'): cannot find source [pid 25347] 11:38:36.903020 D mitogen: _get_module_via_sys_modules('_random') -> [pid 25347] 11:38:36.903227 D mitogen: get_module_source('_random'): cannot find source [pid 25347] 11:38:36.903872 D mitogen: _get_module_via_sys_modules('_uuid') -> [pid 25347] 11:38:36.904048 D mitogen: get_module_source('_uuid'): cannot find source [pid 25347] 11:38:36.905784 D mitogen: _get_module_via_sys_modules('_bz2') -> [pid 25347] 11:38:36.906003 D mitogen: get_module_source('_bz2'): cannot find source [pid 25347] 11:38:36.907526 D mitogen: _get_module_via_sys_modules('_lzma') -> [pid 25347] 11:38:36.907790 D mitogen: get_module_source('_lzma'): cannot find source [pid 25347] 11:38:36.919442 D mitogen: ModuleFinder(): loading 'pkg_resources.extern.packaging' using <_frozen_importlib_external.SourceFileLoader object at 0x7f3a7c342be0> failed: loader for pkg_resources._vendor.packaging cannot handle pkg_resources.extern.packaging [pid 25347] 11:38:36.919608 D mitogen: _get_module_via_sys_modules('pkg_resources.extern.packaging') -> [pid 25347] 11:38:36.926420 D mitogen: ModuleFinder(): loading 'pkg_resources.extern.six' using <_frozen_importlib_external.SourceFileLoader object at 0x7f3a7c31e0f0> failed: loader for pkg_resources._vendor.six cannot handle pkg_resources.extern.six [pid 25347] 11:38:36.926580 D mitogen: _get_module_via_sys_modules('pkg_resources.extern.six') -> [pid 25347] 11:38:36.951445 D mitogen: ModuleFinder(): loading 'pkg_resources.extern.appdirs' using <_frozen_importlib_external.SourceFileLoader object at 0x7f3a7c325908> failed: loader for pkg_resources._vendor.appdirs cannot handle pkg_resources.extern.appdirs [pid 25347] 11:38:36.951605 D mitogen: _get_module_via_sys_modules('pkg_resources.extern.appdirs') -> [pid 25347] 11:38:36.955889 D mitogen: _get_module_via_sys_modules('termios') -> [pid 25347] 11:38:36.956091 D mitogen: get_module_source('termios'): cannot find source [pid 25347] 11:38:36.961827 D mitogen: _get_module_via_sys_modules('_bisect') -> [pid 25347] 11:38:36.962032 D mitogen: get_module_source('_bisect'): cannot find source [pid 25347] 11:38:36.963658 D mitogen: _get_module_via_sys_modules('binascii') -> [pid 25347] 11:38:36.963821 D mitogen: get_module_source('binascii'): cannot find source [pid 25347] 11:38:36.964811 D mitogen: ModuleFinder(): loading 'importlib._bootstrap_external' using failed: type object 'FrozenImporter' has no attribute 'get_filename' [pid 25347] 11:38:36.964919 D mitogen: _get_module_via_sys_modules('importlib._bootstrap_external') -> [pid 25347] 11:38:36.970996 D mitogen: ModuleFinder(): loading 'importlib._bootstrap' using failed: type object 'FrozenImporter' has no attribute 'get_filename' [pid 25347] 11:38:36.971155 D mitogen: _get_module_via_sys_modules('importlib._bootstrap') -> [pid 25347] 11:38:37.008545 D mitogen: ModuleFinder(): loading '_frozen_importlib_external' using failed: type object 'FrozenImporter' has no attribute 'get_filename' [pid 25347] 11:38:37.008696 D mitogen: _get_module_via_sys_modules('_frozen_importlib_external') -> [pid 25347] 11:38:37.014924 D mitogen: ModuleFinder(): loading '_frozen_importlib' using failed: type object 'FrozenImporter' has no attribute 'get_filename' [pid 25347] 11:38:37.015092 D mitogen: _get_module_via_sys_modules('_frozen_importlib') -> [pid 25347] 11:38:37.019718 D mitogen: _get_module_via_sys_modules('pyexpat') -> [pid 25347] 11:38:37.019921 D mitogen: get_module_source('pyexpat'): cannot find source [pid 25347] 11:38:37.028129 D mitogen: _get_module_via_sys_modules('_ctypes') -> [pid 25347] 11:38:37.028362 D mitogen: get_module_source('_ctypes'): cannot find source [pid 25347] 11:38:37.069730 D mitogen: _get_module_via_sys_modules('_opcode') -> [pid 25347] 11:38:37.069969 D mitogen: get_module_source('_opcode'): cannot find source [pid 25347] 11:38:37.083894 D mitogen: _get_module_via_sys_modules('markupsafe._speedups') -> [pid 25347] 11:38:37.084127 D mitogen: get_module_source('markupsafe._speedups'): cannot find source [pid 25347] 11:38:37.109372 D mitogen: _get_module_via_sys_modules('_yaml') -> [pid 25347] 11:38:37.109627 D mitogen: get_module_source('_yaml'): cannot find source [pid 25347] 11:38:37.114303 D mitogen: _get_module_via_sys_modules('_socket') -> [pid 25347] 11:38:37.114568 D mitogen: get_module_source('_socket'): cannot find source [pid 25347] 11:38:37.124310 D mitogen: _get_module_via_sys_modules('_pickle') -> [pid 25347] 11:38:37.124580 D mitogen: get_module_source('_pickle'): cannot find source [pid 25347] 11:38:37.125042 D mitogen: _get_module_via_sys_modules('_decimal') -> [pid 25347] 11:38:37.125228 D mitogen: get_module_source('_decimal'): cannot find source [pid 25347] 11:38:37.125667 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.config') [pid 25347] 11:38:37.125962 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/config/__init__.py', 'ansible.config') -> ['data', 'manager'] [pid 25347] 11:38:37.126179 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.config.data') [pid 25347] 11:38:37.126454 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.config.manager') [pid 25347] 11:38:37.128628 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.constants') [pid 25347] 11:38:37.129519 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.errors') [pid 25347] 11:38:37.129752 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/errors/__init__.py', 'ansible.errors') -> ['yaml_strings'] [pid 25347] 11:38:37.130495 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.errors.yaml_strings') [pid 25347] 11:38:37.130883 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils._text') [pid 25347] 11:38:37.131586 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.common') [pid 25347] 11:38:37.131897 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/module_utils/common/__init__.py', 'ansible.module_utils.common') -> ['_collections_compat', 'collections', 'dict_transformations', 'file', 'process', 'removed'] [pid 25347] 11:38:37.132098 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.common._collections_compat') [pid 25347] 11:38:37.132341 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.common.file') [pid 25347] 11:38:37.132827 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.common.process') [pid 25347] 11:38:37.133198 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.parsing') [pid 25347] 11:38:37.133431 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/module_utils/parsing/__init__.py', 'ansible.module_utils.parsing') -> ['convert_bool'] [pid 25347] 11:38:37.133601 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.parsing.convert_bool') [pid 25347] 11:38:37.133858 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.pycompat24') [pid 25347] 11:38:37.134193 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.six') [pid 25347] 11:38:37.134446 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/module_utils/six/__init__.py', 'ansible.module_utils.six') -> [] [pid 25347] 11:38:37.136610 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.parsing') [pid 25347] 11:38:37.137034 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/parsing/__init__.py', 'ansible.parsing') -> ['ajson', 'dataloader', 'metadata', 'mod_args', 'plugin_docs', 'quoting', 'splitter', 'utils', 'vault', 'yaml'] [pid 25347] 11:38:37.137259 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.parsing.quoting') [pid 25347] 11:38:37.137560 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.utils') [pid 25347] 11:38:37.138396 D mitogen: _build_tuple('/usr/lib/python3.7/site-packages/ansible/utils/__init__.py', 'ansible.utils') -> ['cmd_functions', 'color', 'display', 'encrypt', 'hashing', 'helpers', 'jsonrpc', 'listify', 'module_docs_fragments', 'path', 'plugin_docs', 'py3compat', 'shlex', 'ssh_functions', 'unicode', 'unsafe_proxy', 'vars'] [pid 25347] 11:38:37.138608 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.utils.color') [pid 25347] 11:38:37.139301 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.utils.display') [pid 25347] 11:38:37.140592 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.utils.path') [pid 25347] 11:38:37.141007 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.utils.py3compat') [pid 25347] 11:38:37.141317 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.basic') [pid 25347] 11:38:37.142399 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible.module_utils.json_utils') [pid 25347] 11:38:37.142998 D mitogen: _build_tuple('/home/berend/Source/mitogen/ansible_mitogen/__init__.py', 'ansible_mitogen') -> ['connection', 'loaders', 'logging', 'mixins', 'module_finder', 'parsing', 'planner', 'plugins', 'process', 'runner', 'services', 'strategy', 'target'] [pid 25347] 11:38:37.143191 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible_mitogen') [pid 25347] 11:38:37.179385 D mitogen: _get_module_via_sys_modules('resource') -> [pid 25347] 11:38:37.179599 D mitogen: get_module_source('resource'): cannot find source [pid 25347] 11:38:37.182006 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible_mitogen.target') [pid 25347] 11:38:37.184368 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'mitogen.fork') [pid 25347] 11:38:37.185031 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'mitogen.parent') [pid 25347] 11:38:37.194995 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'mitogen.select') [pid 25347] 11:38:37.195856 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'mitogen.service') [pid 25347] 11:38:37.199676 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'ansible_mitogen.runner') [pid 25347] 11:38:37.200412 D mitogen: _build_tuple('/home/berend/Source/mitogen/mitogen/__init__.py', 'mitogen') -> ['compat', 'core', 'debug', 'doas', 'docker', 'fakessh', 'fork', 'jail', 'kubectl', 'lxc', 'lxd', 'master', 'minify', 'parent', 'select', 'service', 'setns', 'ssh', 'su', 'sudo', 'unix', 'utils'] [pid 25347] 11:38:37.582152 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'encodings.utf_8' is submodule of a package we did not load [pid 25347] 11:38:37.653629 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'encodings.ascii' is submodule of a package we did not load [pid 25347] 11:38:37.654140 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.decoder' is submodule of a package we did not load [pid 25347] 11:38:37.654611 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.re' is submodule of a package we did not load [pid 25347] 11:38:37.655147 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.sys' is submodule of a package we did not load [pid 25347] 11:38:37.655526 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.json' is submodule of a package we did not load [pid 25347] 11:38:37.655845 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.scanner' is submodule of a package we did not load [pid 25347] 11:38:37.656149 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.sre_parse' is submodule of a package we did not load [pid 25347] 11:38:37.656468 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.sre_compile' is submodule of a package we did not load [pid 25347] 11:38:37.656763 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.sre_constants' is submodule of a package we did not load [pid 25347] 11:38:37.657048 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json._json' is submodule of a package we did not load [pid 25347] 11:38:37.657360 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.encoder' is submodule of a package we did not load [pid 25347] 11:38:37.657662 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'json.math' is submodule of a package we did not load [pid 25347] 11:38:37.657952 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.logging' [pid 25347] 11:38:37.658236 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.os' [pid 25347] 11:38:37.658547 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.random' [pid 25347] 11:38:37.658831 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.sys' [pid 25347] 11:38:37.659113 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.threading' [pid 25347] 11:38:37.659381 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.traceback' [pid 25347] 11:38:37.659497 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.mitogen' [pid 25347] 11:38:37.659611 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.codecs' [pid 25347] 11:38:37.659723 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.errno' [pid 25347] 11:38:37.659836 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.fcntl' [pid 25347] 11:38:37.683168 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.getpass' [pid 25347] 11:38:37.683501 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.inspect' [pid 25347] 11:38:37.683883 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.signal' [pid 25347] 11:38:37.684189 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.socket' [pid 25347] 11:38:37.684434 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.subprocess' [pid 25347] 11:38:37.684647 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.termios' [pid 25347] 11:38:37.684855 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.textwrap' [pid 25347] 11:38:37.685062 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.time' [pid 25347] 11:38:37.685268 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.zlib' [pid 25347] 11:38:37.685501 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.cStringIO' [pid 25347] 11:38:37.685706 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.functools' [pid 25347] 11:38:37.685906 D mitogen: ModuleResponder(Router(Broker(0x7f3a7a1a3160)))._on_get_module(b'mitogen.compat') [pid 25347] 11:38:37.686985 D mitogen: _build_tuple('/home/berend/Source/mitogen/mitogen/compat/__init__.py', 'mitogen.compat') -> ['functools', 'pkgutil', 'tokenize'] [pid 25347] 11:38:37.687482 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'mitogen.compat') [pid 25347] 11:38:37.817453 D mitogen: ModuleResponder(Router(Broker(0x7f3a7a1a3160)))._on_get_module(b'mitogen.compat.functools') [pid 25347] 11:38:37.826707 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'mitogen.compat.functools') [pid 25347] 11:38:37.898020 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.compat.threading' [pid 25347] 11:38:37.962915 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.grp' [pid 25347] 11:38:37.963497 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.pprint' [pid 25347] 11:38:37.963908 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.pwd' [pid 25347] 11:38:37.964272 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.stat' [pid 25347] 11:38:37.964642 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'ansible.module_utils.json' [pid 25347] 11:38:37.964968 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'ctypes.os' is submodule of a package we did not load [pid 25347] 11:38:37.965282 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'ctypes.sys' is submodule of a package we did not load [pid 25347] 11:38:37.965615 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'ctypes._ctypes' is submodule of a package we did not load [pid 25347] 11:38:37.965912 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'ctypes.struct' is submodule of a package we did not load [pid 25347] 11:38:37.966206 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'ctypes.ctypes' is submodule of a package we did not load [pid 25347] 11:38:37.966520 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'ctypes._endian' is submodule of a package we did not load [pid 25347] 11:38:37.986350 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'ansible.module_utils.codecs' [pid 25347] 11:38:38.010758 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'ansible.module_utils.ansible' [pid 25347] 11:38:38.011471 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'ansible.module_utils.sys' [pid 25347] 11:38:38.011981 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'ansible.module_utils.ast' [pid 25347] 11:38:38.012420 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'ansible.module_utils.six.moves' [pid 25347] 11:38:38.012867 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'ansible.module_utils.parsing.ansible' [pid 25347] 11:38:38.013284 D mitogen.ctx.ssh.example.com: ansible_mitogen.runner: EnvironmentFileWatcher(u'/home/ansible/.pam_environment') installed; existing keys: [] [pid 25347] 11:38:38.013671 D mitogen.ctx.ssh.example.com: ansible_mitogen.runner: EnvironmentFileWatcher(u'/etc/environment') installed; existing keys: [] [pid 25347] 11:38:38.014018 D mitogen.ctx.ssh.example.com: mitogen: replaced Poller(0x81580ec) with EpollPoller(0x8251f6c) (new: 4 readers, 1 writers; old: 4 readers, 1 writers) [pid 25347] 11:38:38.014395 D mitogen.ctx.ssh.example.com: mitogen: Router(Broker(0x8153fec)).upgrade() [pid 25347] 11:38:38.014574 D mitogen: IdAllocator(Router(Broker(0x7f3a7a1a3160))): allocating [3..1003) [pid 25347] 11:38:38.014734 D mitogen: IdAllocator(Router(Broker(0x7f3a7a1a3160))): allocating [3..1003) to Context(2, 'ssh.example.com') [pid 25347] 11:38:38.070738 D mitogen.ctx.ssh.example.com: mitogen: mitogen.fork.Stream(u'default').connect() [pid 25347] 11:38:38.135100 D mitogen.ctx.ssh.example.com: mitogen: mitogen.fork.Stream(u'fork.20976').connect(): stdin=16, stdout=15, diag=None [pid 25347] 11:38:38.135508 D mitogen: Adding route to 3 via mitogen.ssh.Stream('ssh.example.com') [pid 25347] 11:38:38.135725 D mitogen: Router(Broker(0x7f3a7a1a3160)).add_route(3, mitogen.ssh.Stream('ssh.example.com')) [pid 25347] 11:38:38.135963 D mitogen.ctx.ssh.example.com: ansible_mitogen.target: Selected temp directory: u'/home/ansible/.ansible/tmp' (from [u'/home/ansible/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/ansible']) [pid 25347] 11:38:38.136275 D mitogen.ctx.fork.20976: mitogen: register(Context(2, 'parent'), mitogen.core.Stream('parent')) [pid 25347] 11:38:38.136515 D mitogen.ctx.fork.20976: mitogen: Connected to Context(2, 'parent'); my ID is 3, PID is 20976 [pid 25347] 11:38:38.136720 D mitogen.ctx.fork.20976: mitogen: Recovered sys.executable: '/usr/bin/python2.6' [pid 25347] 11:38:38.138286 D mitogen: CallChain(Context(2, 'ssh.example.com')).call_async(): mitogen.parent._proxy_connect(name=None, method_name='sudo', kwargs=Kwargs({'unidirectional': True, 'username': 'root', 'password': [secret], 'python_path': ['/usr/bin/python2.6'], 'sudo_path': None, 'connect_timeout': 120, 'sudo_args': ['-H', '-S', '-n'], 'debug': False, 'profiling': False})) [pid 25347] 11:38:38.189918 D mitogen: ModuleResponder(Router(Broker(0x7f3a7a1a3160)))._on_get_module(b'mitogen.sudo') [pid 25347] 11:38:38.207151 D mitogen: _send_load_module(mitogen.ssh.Stream('ssh.example.com'), 'mitogen.sudo') [pid 25347] 11:38:38.266280 D mitogen.ctx.ssh.example.com: mitogen: Importer(): master doesn't know 'mitogen.optparse' [pid 25347] 11:38:38.333013 D mitogen.ctx.ssh.example.com: mitogen: mitogen.sudo.Stream(u'default').connect() [pid 25347] 11:38:38.333633 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'encodings.base64_codec' is submodule of a package we did not load [pid 25347] 11:38:38.334088 D mitogen.ctx.ssh.example.com: mitogen: Importer(): 'encodings.base64' is submodule of a package we did not load [pid 25347] 11:38:38.334528 D mitogen.ctx.ssh.example.com: mitogen.sudo: sudo command line: ['sudo', '-u', u'root', '-H', '--', u'/usr/bin/python2.6', '-c', u'import codecs,os,sys;_=codecs.decode;exec(_(_("eNqFkTFrwzAQhef4V3g7iQhHjilpDIaWDKVDKZjSDG0osi23Io4kZCdu8ut7sQux06HbfXrv7h2nlK0TUwdWWUmo51g7IFX6CKVxW0Jjb4J1sbdzwlnIOb1wyobkUA17zitTS5IOwQ1hPYQWAQPrI8ZXosHUnZ8kPhTCtUqDL3TRifJb5vtGZJXs5Nm+drNM6Zk9Nl9GA+45ubJNk67xIF2tjH6Lo00XK/VBOWS4Tx9eOWyScVvvQazIWGBjnALZqcZ8Sh0LXSt8uTOLRXTDeWC2WZCb4CTiOb9dLilQD2e2TjWShAyeHl+eOefvGnCX3BR4cuqtkg9yPnphrNR4anAZ0MBJUZAwisI5ZXBSFieVNrn41gzaDM7/UNrfgFVX97e9crf/uf9uGY62/AEDZLCN".encode(),"base64"),"zip"))'] [pid 25347] 11:38:38.335066 D mitogen.ctx.ssh.example.com: mitogen: hybrid_tty_create_child() pid=20978 stdio=18, tty=17, cmd: sudo -u root -H -- /usr/bin/python2.6 -c "import codecs,os,sys;_=codecs.decode;exec(_(_(\"eNqFkTFrwzAQhef4V3g7iQhHjilpDIaWDKVDKZjSDG0osi23Io4kZCdu8ut7sQux06HbfXrv7h2nlK0TUwdWWUmo51g7IFX6CKVxW0Jjb4J1sbdzwlnIOb1wyobkUA17zitTS5IOwQ1hPYQWAQPrI8ZXosHUnZ8kPhTCtUqDL3TRifJb5vtGZJXs5Nm+drNM6Zk9Nl9GA+45ubJNk67xIF2tjH6Lo00XK/VBOWS4Tx9eOWyScVvvQazIWGBjnALZqcZ8Sh0LXSt8uTOLRXTDeWC2WZCb4CTiOb9dLilQD2e2TjWShAyeHl+eOefvGnCX3BR4cuqtkg9yPnphrNR4anAZ0MBJUZAwisI5ZXBSFieVNrn41gzaDM7/UNrfgFVX97e9crf/uf9uGY62/AEDZLCN\".encode(),\"base64\"),\"zip\"))" [pid 25347] 11:38:38.336162 D mitogen.ctx.ssh.example.com: mitogen: mitogen.sudo.Stream(u'local.20978').connect(): stdin=18, stdout=19, diag=17 [pid 25347] 11:38:38.336642 D mitogen.ctx.ssh.example.com: mitogen.sudo: mitogen.sudo.Stream(u'local.20978'): received 'MITO000\n' [pid 25347] 11:38:38.337212 D mitogen.ctx.ssh.example.com: mitogen: mitogen.sudo.Stream(u'local.20978')._ec0_received() [pid 25347] 11:38:38.356573 D mitogen: Adding route to 4 via mitogen.ssh.Stream('ssh.example.com') [pid 25347] 11:38:38.356859 D mitogen: Router(Broker(0x7f3a7a1a3160)).add_route(4, mitogen.ssh.Stream('ssh.example.com')) [pid 25347] 11:38:38.357398 D mitogen: CallChain(Context(4, 'ssh.example.com.sudo.root')).call_async(): ansible_mitogen.target.init_child(log_level=10, candidate_temp_dirs=['~/.ansible/tmp', '/var/tmp', '/tmp']) [pid 25347] 11:38:38.379666 D mitogen.ctx.ssh.example.com: mitogen: mitogen.parent.DiagLogStream(fd=17, u'sudo.root').on_disconnect() [pid 25347] 11:38:38.444416 D mitogen.ctx.ssh.example.com: mitogen: mitogen.sudo.Stream(u'sudo.root').on_disconnect() [pid 25347] 11:38:38.444959 D mitogen.ctx.ssh.example.com: mitogen: mitogen.sudo.Stream(u'sudo.root') is gone; propagating DEL_ROUTE for set([4]) [pid 25347] 11:38:38.445390 D mitogen.ctx.ssh.example.com: mitogen: Router(Broker(0x8153fec)).del_route(4) [pid 25347] 11:38:38.445799 D mitogen: : Firing local disconnect for Context(4, 'ssh.example.com.sudo.root') [pid 25347] 11:38:38.446150 I ansible_mitogen.services: Forgetting Context(4, 'ssh.example.com.sudo.root') due to stream disconnect [pid 25347] 11:38:38.446695 D ansible_mitogen.services: ContextService(): attempt to forget unknown Context(4, 'ssh.example.com.sudo.root') [pid 25347] 11:38:38.447099 D mitogen: Deleting route to 4 via mitogen.ssh.Stream('ssh.example.com') [pid 25347] 11:38:38.447505 D mitogen: Router(Broker(0x7f3a7a1a3160)).del_route(4) [pid 25347] 11:38:38.448069 D mitogen.ctx.ssh.example.com: mitogen: mitogen.parent.DiagLogStream(fd=17, u'sudo.root').on_disconnect() [pid 25347] 11:38:38.448557 D mitogen.ctx.ssh.example.com: mitogen: mitogen.sudo.Stream(u'sudo.root'): PID 20978 exited due to signal 1 (SIGHUP) [pid 25366] 11:38:38.451092 D mitogen: mitogen.core.Stream('unix_listener.25347').on_disconnect() [pid 25347] 11:38:38.451908 D mitogen: mitogen.core.Stream('unix_client.25366').on_disconnect() [pid 25366] 11:38:38.451933 D mitogen: Waker(Broker(0x7f3a7995d1d0) rfd=11, wfd=12).on_disconnect() example.com | FAILED! => { "msg": "Channel was disconnected while connection attempt was in progress; this may be caused by an abnormal Ansible exit, or due to an unreliable target." } [pid 25347] 11:38:38.459850 D mitogen: Waker(Broker(0x7f3a7a1a3160) rfd=6, wfd=8).on_disconnect() [pid 25347] 11:38:38.460242 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-1'): channel or latch closed, exitting: None [pid 25347] 11:38:38.460560 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-2'): channel or latch closed, exitting: None [pid 25347] 11:38:38.460743 D mitogen: mitogen.parent.DiagLogStream(fd=13, 'ssh.example.com').on_disconnect() [pid 25347] 11:38:38.460942 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-3'): channel or latch closed, exitting: None [pid 25347] 11:38:38.461295 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-4'): channel or latch closed, exitting: None [pid 25347] 11:38:38.461456 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-5'): channel or latch closed, exitting: None [pid 25347] 11:38:38.462662 D mitogen: mitogen.ssh.Stream('ssh.example.com') closing CALL_FUNCTION channel [pid 25347] 11:38:38.462833 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-6'): channel or latch closed, exitting: None [pid 25347] 11:38:38.463042 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-7'): channel or latch closed, exitting: None [pid 25347] 11:38:38.463164 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-8'): channel or latch closed, exitting: None [pid 25347] 11:38:38.463412 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-9'): channel or latch closed, exitting: None [pid 25347] 11:38:38.463677 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-10'): channel or latch closed, exitting: None [pid 25347] 11:38:38.463746 D mitogen: mitogen.ssh.Stream('ssh.example.com').on_disconnect() [pid 25347] 11:38:38.463867 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-12'): channel or latch closed, exitting: None [pid 25347] 11:38:38.464105 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-13'): channel or latch closed, exitting: None [pid 25347] 11:38:38.464438 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-14'): channel or latch closed, exitting: None [pid 25347] 11:38:38.464499 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-15'): channel or latch closed, exitting: None [pid 25347] 11:38:38.464701 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-11'): channel or latch closed, exitting: None [pid 25347] 11:38:38.464792 I mitogen: mitogen.service.Pool(0x7f3a7a1a3828, size=16, th='mitogen.service.Pool.7f3a7a1a3828.worker-0'): channel or latch closed, exitting: None [pid 25347] 11:38:38.465084 D mitogen: mitogen.ssh.Stream('ssh.example.com') is gone; propagating DEL_ROUTE for {2, 3} [pid 25347] 11:38:38.465899 D mitogen: Router(Broker(0x7f3a7a1a3160)).del_route(2) [pid 25347] 11:38:38.466019 I ansible_mitogen.services: Forgetting Context(2, 'ssh.example.com') due to stream disconnect [pid 25347] 11:38:38.466132 D mitogen: Router(Broker(0x7f3a7a1a3160)).del_route(3) [pid 25347] 11:38:38.466239 D mitogen: mitogen.parent.DiagLogStream(fd=13, 'ssh.example.com').on_disconnect() [pid 25347] 11:38:38.466434 D mitogen: mitogen.ssh.Stream('ssh.example.com'): PID 25369 exited with return code 255 ````
dw commented 5 years ago

This is some fabulous debugging on your part, thanks so much! It definitely looks like some kind of early startup crash, because there are no messages after ec0_received suggesting e.g. the child started importing modules. The 'DiagLogStream.on_disconnect()' means the bootstrap had completed entirely, and only the main process remained on the remote - there is no good debug log meaning "we registered the new stream with the io loop", instead you can only see disconnect messages generated by the io loop

Do you have strace available on one of those machines? Is there any possibility you could try out this trick? Otherwise I will start combing the work done in November that you have isolated.

Currently working on some other stuff, but will get to this ASAP. Thanks again

dw commented 5 years ago

Ah, doh, forget that. The strace won't work because strace can't trace a setuid app like sudo. You'll just get a completely different failure instead. If you're feeling brave, you could use the same 'strace -ff' attached to the top-level SSHd on the box. When it forks to accept the Ansible client, the tracer will eventually be installed on every process including the Python interpreter that is dying. This will produce a ton of files.. easiest way to find a relevant one is 'grep -E execve.*python /tmp/....'

dw commented 5 years ago

The commit you flagged was painful, if I remember it broke a few things. Will find time to look at all the setup code it touches, its very possibly the culprit.

Note that ticket was also about cleaning up many FD leaks during connect. One case in particular that I definitely recall was having my process killed by the kernel because the controlling side of the child's PTY ended up owned by the parent. This code touched the order in which FDs got opened and closed, and it could be somehow related to the parent's interaction with the master side of the PTY.

 [pid 25347] 11:38:38.448557 D mitogen.ctx.ssh.example.com: mitogen: mitogen.sudo.Stream(u'sudo.root'): PID 20978 exited due to signal 1 (SIGHUP)

Hangup! That is very TTY-related. Okay, working theory, just need to find time to look into this. Again thanks a ton for the bisecting, it helps so much

dw commented 5 years ago

I'm starting to recall a variant of this very issue, because I had the sudoers manpage open scanning for FD-related options very recently. Do you have any strangeish 'Defaults' lines in your sudoers, or maybe an SELinux policy loaded? It would be nice to know why I'm not seeing this here

dw commented 5 years ago

The only reason sudo could be getting SIGHUP from the kernel is if the parent process closed the master side of the TTY, so I don't think this can be explained by any sudoers options.. it doesn't even receive a copy of the relevant FD

berenddeschouwer commented 5 years ago

No selinux.

The only potentially relevant sudo option -- I think -- is not-requiretty

It's an old machine. LTS, but old, so it could be a kernel thing. It's also Python 2.6, so I'm trying to see if I can get a different version side-by-side. I want a later Python (3.x) for other reasons anyway; but it's going to take a while.

btw, sudo will quit if run under strace. I did try sudo debug logging, but I got minimal information, so I'd have to compile a debug version.

dw commented 5 years ago

Its fine, don't worry about strace, SIGHUP really is the smoking gun. The only place that gets sent from in this scenario is by the kernel, and only when the Python bits in the parent close the master FD while any child still has the slave end as its controlling TTY.

Another possibility is that the master FD is being overwritten. There are existing tickets open about dup2() safety in the code, and it's bitten a few times before. Those bugs tend to reveal themselves at strange times, and always manifest by an unrelated component failing

dw commented 5 years ago

FWIW when I'm debugging this stuff, I find it useful to litter the parent.py connect functions with "os.system('ls- l /proc/%d/fd' % os.getpid())" :) total hack, but the only way to get useful diagnostic info out a the exact moment it's meaningful

dw commented 5 years ago

Hi Berend :)

Just a few questions to narrow this a bit:

dw commented 5 years ago

Another idea is that for whatever reason on your config, the sudo invocation is closing the last FD connected to the slave PTY. That causes DiagLogStream.on_disconnect() to fire, which in turn causes the master to close the master side. That would trigger the SIGHUP.

If you're feeling daring, comment out these two lines in parent.py (line ~1330):

        if self.diag_stream:                                                          
            self._router.broker.start_receive(self.diag_stream)       

That would prevent it from trying to read from the master TTY, and thus never noticing slave disconnect. The master FD would still be cleaned up as normal during the data FD's on_disconnect().

berenddeschouwer commented 5 years ago

Hi Berend :)

Just a few questions to narrow this a bit:

* Was the bisect 'bang on the money'? There were quite some interesting cleanups that day

* Does this still manifest with latest master?

* What's the target OS / Python version / sudo version, I may need to setup a VM

Yes, I checked forward/backwards twice. Yes, it's still a problem with master

It happens reliably on older machines, but not newer machines, so it's possibly kernel/python/sudo version.

I'm collecting more data on where it's a problem, ie. bisect OS.

Broken: python 2.6.8 / old kernel / sudo 1.6.8 (RHEL patched) Working: same python / same kernel / sudo 1.7.2 (RHEL patched) Working: python 2.6.6 / new(er) kernel / new(er) sudo 1.7.4 (RHEL patched) Working: python 2.6.6 / new(er) kernel / new(er) sudo 1.8.6 (RHEL patched) Working: brand new

I think I found my solution. I shouldn't be running sudo 1.6.8 anywhere anyway. 1.7 still received critical fixes.

dw commented 5 years ago

Hi Berend,

I'd like to reproduce this before release, could you tell me which precise version of RHEL/CentOS the sudo comes from? Otherwise I can try with CentOS 5.11

dw commented 5 years ago

I have reproduced it on 5.0

Log ```` TASK [setup ] **************************************************************************************************************************************************************** ERROR! [pid 10397] 18:06:14.402474 E mitogen: defer() crashed: (*(Context(3, u'ssh.172.16.80.131.sudo.root'), (u'ansible.module_utils.basic', u'ansible.module_utils.json_utils', u'ansible.release', u'ansible_mitogen.runner', u'ansible_mitogen.target', u'mitogen.fork', u'mitogen.service')), **{}) Traceback (most recent call last): File "/home/dmw/src/mitogen/mitogen/core.py", line 2330, in on_receive func(*args, **kwargs) File "/home/dmw/src/mitogen/mitogen/master.py", line 932, in _forward_modules self._forward_one_module(context, mitogen.core.to_text(fullname)) File "/home/dmw/src/mitogen/mitogen/master.py", line 926, in _forward_one_module self._send_module_and_related(stream, fullname) File "/home/dmw/src/mitogen/mitogen/master.py", line 865, in _send_module_and_related if fullname in stream.sent_modules: AttributeError: 'NoneType' object has no attribute 'sent_modules' ERROR! [pid 10397] 18:06:19.409809 E mitogen: Broker(0x7f22158cd450): some streams did not close gracefully. The most likely cause for this is one or more child processes still connected to our stdout/stderr pipes. An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ChannelError: the respondent Context has disconnected fatal: /home/dmw/src/mitogen/tests/ansible/issue481.yml:10: [172.16.80.131]: FAILED! => { exception: Traceback (most recent call last): exception: File "/home/dmw/src/mitogen/.venv/local/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 125, in run exception: res = self._execute() exception: File "/home/dmw/src/mitogen/.venv/local/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 522, in _execute exception: result = self._handler.run(task_vars=variables) exception: File "/home/dmw/src/mitogen/ansible_mitogen/mixins.py", line 115, in run exception: return super(ActionModuleMixin, self).run(tmp, task_vars) exception: File "/home/dmw/src/mitogen/.venv/local/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 45, in run exception: results = merge_hash(results, self._execute_module(tmp=tmp, task_vars=task_vars, wrap_async=wrap_async)) exception: File "/home/dmw/src/mitogen/ansible_mitogen/mixins.py", line 333, in _execute_module exception: self._connection._connect() exception: File "/home/dmw/src/mitogen/ansible_mitogen/connection.py", line 747, in _connect exception: self._connect_stack(stack) exception: File "/home/dmw/src/mitogen/ansible_mitogen/connection.py", line 672, in _connect_stack exception: stack=mitogen.utils.cast(list(stack)), exception: File "/home/dmw/src/mitogen/mitogen/core.py", line 1844, in call_service exception: return recv.get().unpickle() exception: File "/home/dmw/src/mitogen/mitogen/core.py", line 1025, in get exception: msg._throw_dead() exception: File "/home/dmw/src/mitogen/mitogen/core.py", line 788, in _throw_dead exception: raise ChannelError(self.data.decode('utf-8', 'replace')) exception: ChannelError: the respondent Context has disconnected failed: True msg: Unexpected failure during module execution. } ````
dw commented 5 years ago

I think I have an idea of what's going on.. this bizarre old version of sudo doesn't hang around after it starts the child process. In fact, it overwrites itself with Python.

Python then forks into a 'first stage', which keeps the TTY FD around, while the parent half of the fork re-execs Python to clean the inherited argv.

Parent half setup completes, including by overwriting the inherited TTY FD with an IoLogger to capture spurious output generated by subprocesses, and the first stage exits -- closing the last remaining TTY FD that is open.

dw commented 5 years ago

thanks a ton for persisting with this!

This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/

Thanks for reporting this!