Ansible provisioner unable to provision hosts in parallel when using collections

larsks commented 2 years ago

Vagrant version

Vagrant 2.2.16

Host operating system

Fedora 35

Guest operating system

Fedora 36

Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.provider :libvirt do |libvirt|
    libvirt.uri = "qemu:///system"
    libvirt.memory = 4096
    libvirt.storage :file,
      :size => "10G",
      :type => "qcow2"
  end

  config.vm.box = "fedora/36-cloud-base"
  config.vm.box_version = "36-20220504.1"

  config.vm.define "server" do |server|
    server.vm.hostname = "server"
  end

  config.vm.define "client" do |client|
    client.vm.hostname = "client"
  end

  config.vm.provision :ansible do |ansible|
    ansible.compatibility_mode = "2.0"
    ansible.playbook = "provision/all.yaml"
    ansible.galaxy_role_file = "provision/requirements.yaml"
  end

end

Galaxy role file

collections:
  - community.general

Ansible playbook

- hosts: all
  become: true
  tasks:
    - name: create bridge device
      community.general.nmcli:
        type: bridge
        conn_name: br0
        ifname: br0
        state: present

Debug output

Attached to this issue: vagrant.log.txt

Expected behavior

Vagrant should successfully provision the two hosts

Actual behavior

Vagrant fails with:

==> client: Running provisioner: ansible...
==> server: Running provisioner: ansible...
    client: Running ansible-galaxy...
    server: Running ansible-galaxy...
Starting galaxy collection install process
Process install dependency map
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/community-general-5.0.0.tar.gz to /home/lars/.ansible/tmp/ansible-local-2837921qcczvbg/tmp9ysy8ja9/community-general-5.0.0-mxvhyy3t
Installing 'community.general:5.0.0' to '/home/lars/.ansible/collections/ansible_collections/community/general'
Starting collection install process
Downloading https://galaxy.ansible.com/download/community-general-5.0.0.tar.gz to /home/lars/.ansible/tmp/ansible-local-283797xmhhgys2/tmppw9s6q5h/community-general-5.0.0-ppavxysw
Installing 'community.general:5.0.0' to '/home/lars/.ansible/collections/ansible_collections/community/general'
ERROR! Unexpected Exception, this is probably a bug: [Errno 39] Directory not empty: 'modules'
to see the full traceback, use -vvv
==> server: Removing domain...
==> server: An error occurred. The error will be shown after all tasks complete.
community.general:5.0.0 was installed successfully
    client: Running ansible-playbook...
ERROR! couldn't resolve module/action 'community.general.nmcli'. This often indicates a misspelling, missing collection, or incorrect module path.

The error appears to be in '/home/lars/tmp/vagrant/bug/provision/all.yaml': line 11, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

    - name: create bridge device
      ^ here
==> client: Removing domain...
==> client: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'server'
machine. Please handle this error then try again:

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

An error occurred while executing the action on the 'client'
machine. Please handle this error then try again:

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Running vagrant up --no-parallel avoids this error (but can take substantially longer with more hosts and more complex playbooks).

dkinzer commented 2 years ago

@larsks , just curious if this pattern works for you as defined at: https://www.vagrantup.com/docs/provisioning/ansible

# Vagrant 1.7+ automatically inserts a different
# insecure keypair for each new VM created. The easiest way
# to use the same keypair for all the machines is to disable
# this feature and rely on the legacy insecure key.
# config.ssh.insert_key = false
#
# Note:
# As of Vagrant 1.7.3, it is no longer necessary to disable
# the keypair creation when using the auto-generated inventory.

N = 3
(1..N).each do |machine_id|
  config.vm.define "machine#{machine_id}" do |machine|
    machine.vm.hostname = "machine#{machine_id}"
    machine.vm.network "private_network", ip: "192.168.77.#{20+machine_id}"

    # Only execute once the Ansible provisioner,
    # when all the machines are up and ready.
    if machine_id == N
      machine.vm.provision :ansible do |ansible|
        # Disable default limit to connect to all the machines
        ansible.limit = "all"
        ansible.playbook = "playbook.yml"
      end
    end
  end
end

The way this pattern is supposed to work is that it makes sure that all the machines are running before it starts to provision.

This used to work fine for me, but recently it has stopped working. I think the issue might be a bug in the latest version of Vagrant.

larsks commented 2 years ago

@dkinzer that seems to work for me (using Vagrant 2.2.16 with libvirt), and would work around the race condition with collections support.

hashicorp / vagrant