Open jimmymccrory opened 5 years ago
@s1113950 do you have any ideas on this, we've been trying every so often to get openstack-ansible and mitogen working together and this looks like one of the final remaining issues we have.
At first glance I'm not sure, but here's how I run Mitogen using delegate_to
pointing at a container that I created in the playbook run (note that I don't use lxc_container
though): https://github.com/s1113950/mitogen-test/blob/7a39ef020712e8ff225a3343d72b56c96d71382a/roles/run_test/tasks/main.yml#L64 . If possible, you could also try upgrading to one of the new Mitogen tags (https://github.com/mitogen-hq/mitogen/releases/tag/v0.2.10-rc.0 if you need Ansible 2.7 support) to see if the issue still exists.
So I've recreated this issue using ansible 2.10.6 and mitogen v0.3.0rc1 with docker using a mishmash of the ansible jimmymcrory provided and the mitogen-test repo linked.
In Openstack-Ansible we have CI jobs that essentially spin up a small cloud on a single physical host with the various cloud services in lxc containers on the single host, giving us our usecase of why it would be good to test mitogen_via the localhost.
The code I have most closely been looking at is the _stack_from_spec method in connections.py, specifically the lines around cycle detection. https://github.com/mitogen-hq/mitogen/blob/cc8f9a016965876bcd9ec390d53035d6ed842b07/ansible_mitogen/connection.py#L734
Having edited in a couple of debug log print statements we can see some more detailed output of what is happening when we try to delegate a task to a container on the localhost, using mitogen_via=localhost.
TASK [gather and delegate facts] ***************************************************************************************************************************
task path: /home/ubuntu/mitogen-delegation-bug/docker-reproduce-bug.yml:36
[task 123955] 12:36:29.856573 D ansible_mitogen.affinity: CPU mask for WorkerProcess: 0x000001
[task 123955] 12:36:29.863105 D ansible_mitogen.connection: In _stack_from_spec, spec.inventory_name: localhost, seen_names: (), spec.mitogen_via: localhost
[task 123955] 12:36:29.863280 D ansible_mitogen.connection: Calling _stack_from_spec(spec_from_via))
[task 123955] 12:36:29.867411 D ansible_mitogen.connection: In _stack_from_spec, spec.inventory_name: localhost, seen_names: ('localhost',), spec.mitogen_via: None
[task 123955] 12:36:29.867745 D ansible_mitogen.mixins: _remove_tmp_path(None)
fatal: [localhost]: UNREACHABLE! => {
"changed": false,
"msg": "mitogen_via=None of localhost creates a cycle (localhost -> localhost)",
"unreachable": true
}
If mitogen_via is true _stack_from_spec is called a second time with the inventory_name being added to the seen_names which in this case are both localhost. https://github.com/mitogen-hq/mitogen/blob/cc8f9a016965876bcd9ec390d53035d6ed842b07/ansible_mitogen/connection.py#L749 This leads to a cycle being detected in _stack_from_spec as localhost is the inventory_name and is in the seen_names.
We feel this does not account for the fact that the task is being delegated to a container. Perhaps this is a unique case in which delegation needs to be detected and the cycle allowed for, however I'm are unsure of how that should be done as I don't think we have access to that information here.
Having removed the cycle detection the play runs through smoothly.
Here are my ansible playbook and Dockerfile if they are of any use: Playbook
---
- hosts: localhost
gather_facts: no
vars:
ansible_python_interpreter: /usr/bin/python3
tasks:
- name: stopping any old test container
docker_container:
name: docker-test-container
state: absent
vars:
ansible_python_interpreter: /usr/bin/python3
- name: Wait for container to be stopped
pause:
seconds: 2
- name: create a container
docker_container:
name: docker-test-container
state: started
image: test-docker-image:latest
- name: add container to inventory
add_host:
name: docker-test-container
ansible_user: test
ansible_password: test
ansible_ssh_port: 22
ansible_connection: docker
#ansible_connection: setns
mitogen_kind: docker
mitogen_via: localhost
- name: gather and delegate facts
setup:
gather_subset: '!all:hardware'
delegate_to: docker-test-container
delegate_facts: true
- name: delegate a task
debug:
msg: "Ansible_host: {{ ansible_host }}"
delegate_to: docker-test-container
delegate_facts: true
Dockerfile
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y python3 openssh-server sudo
RUN useradd -rm -d /home/ubuntu -s /bin/bash -g root -G sudo -u 1000 test
RUN echo 'test:test' | chpasswd
EXPOSE 22
RUN service ssh start
CMD ["/usr/sbin/sshd","-D"]
A little bit of extra feedback on this issue. We were also having a similar problem using the LXD connection plugin, commenting out the cycle detection seems to fix the issue for us too. Commenting out the following if statement means our playbooks work perfectly.
I'm helping with the OpenStack-Ansible integration and we're currently running into a error when a host tries to delegate its facts to a container that it's hosting.
I've been able to reproduce this with a much simpler playbook and inventory than an OpenStack-Ansible deployment would be using.
playbook
inventory
Both the host and container are running Ubuntu 18.04.2 and Python 2.7.15rc1