lae / ansible-role-netbox

Cross-platform Ansible role for deploying NetBox, a DCIM/IPAM tool, in a production environment.
MIT License
201 stars 72 forks source link

Fails to deploy on clean Debian 11 #159

Closed martino87r closed 1 year ago

martino87r commented 1 year ago

Hi, the role fails to deploy on clean Debian 11.7 and Ansible 2.14

TASK [lae.netbox : Gather OS specific variables] *************************************************************************************************************************************************************************************
task path: /home/martino/.ansible/roles/lae.netbox/tasks/load_variables.yml:2
fatal: [netbox-dev-local]: FAILED! => {
    "msg": "No file was found when using first_found."
}

I'm assuming the failure is related to this issue: https://github.com/ansible/ansible/issues/70772

The postgres role went around by ensuring there were no undefined variables in the matching:

...

# Variable configuration.
   - name: Include OS-specific variables (Debian).
     include_vars: "{{ ansible_distribution }}-{{ ansible_distribution_version.split('.')[0] }}.yml"-0
-->  when: ansible_os_family == 'Debian'
...
lae commented 1 year ago

This role relies on the variables being defined so that we know what packages to install. It's not something we can just skip if the variable is undefined.

Can you run ansible-playbook prefixed with ANSIBLE_STDOUT_CALLBACK=debug (environment variable) and provide the output from that?

I believe Ansible grabs those variables from /etc/os-release, which is owned by the base-files package. Is this file/package possibly missing for you? I tested deployment on the latest Debian vagrant box (from a few days ago) and it seems to still function correctly.

martino87r commented 1 year ago

The issue is caused by the fact that when ansible evaluates with_first_found with an undefined variable it fails hence the bug reference, this was working fine prior to 2.9, the variables are gathered from facts and are there except for one which comes from redhat based systems and is thus undefined

I will create a PR if time permits, for now I've solved it like this:

- name: Include OS-specific variables (Debian).
  include_vars: "{{ ansible_distribution | lower }}-{{ ansible_distribution_version.split('.')[0] }}.yml"
  when: ansible_os_family == 'Debian'

- name: Include OS-specific variables (RedHat).
  include_vars: "{{ ansible_os_family | lower }}-{{ ansible_distribution_version.split('.')[0] }}.yml"
  when:
  - ansible_os_family == 'RedHat'

- name: Include OS-specific variables (Ubuntu).
  include_vars: "{{ ansible_distribution | lower }}-{{ ansible_distribution_version.split('.')[0] }}.yml"
  when: ansible_os_family == 'Ubuntu'
lae commented 1 year ago

I couldn't reproduce it with Ansible 2.14.4, and I wasn't really sure of the relevance of the bug reference, sorry.

Are you talking about ansible_os_family being undefined? On the clean VM image I'm testing with it appears to be defined, but I don't know if that fact comes from some other package or file.

TASK [debug] *******************************************************************
ok: [default] => {
    "ansible_os_family": "Debian"
}

As for the postgres role, that bit seemed more like how they're including variables, not necessarily that that's how they were working around the issue you're referring to. (I can't find any reference to with_first_found in the repository in the first place.)

martino87r commented 1 year ago

That's definitely strange:

- name: Debug stuff
  debug:
    msg: "{{ ansible_distribution }} | {{ ansible_distribution_version }} | {{ ansible_distribution_major_version }} | {{ ansible_os_family }}"

- name: Gather OS specific variables
  include_vars: "{{ item }}"
  with_first_found:
    - "{{ ansible_distribution|lower }}-{{ ansible_distribution_version }}.yml"
    - "{{ ansible_distribution|lower }}-{{ ansible_distribution_major_version }}.yml"
    - "{{ ansible_distribution|lower }}.yml"
    - "{{ ansible_os_family|lower }}-{{ ansible_distribution_major_version }}.yml"

Output:

TASK [lae.netbox : Debug stuff] ******************************************************************************************************************************************************************************************************
ok: [netbox-dev-local] => {}

MSG:

Debian | 11 | 11 | Debian

TASK [lae.netbox : Gather OS specific variables] *************************************************************************************************************************************************************************************
fatal: [netbox-dev-local]: FAILED! => {}

MSG:

No file was found when using first_found.

PLAY RECAP ***************************************************************************************************************************************************************************************************************************
netbox-dev-local           : ok=57   changed=0    unreachable=0    failed=1    skipped=46   rescued=0    ignored=0   

I've checked also with ANSIBLE_DEBUG enabled and the variable resolution was there but for some reason it still refuses to load the variables... Any ideas?

 82146 1683215786.56213: search_path:
        /home/martino/.ansible/roles/lae.netbox/vars/debian-11.yml
        /home/martino/.ansible/roles/lae.netbox/debian-11.yml
        /home/martino/.ansible/roles/lae.netbox/tasks/vars/debian-11.yml
        /home/martino/.ansible/roles/lae.netbox/tasks/debian-11.yml
martino87r commented 1 year ago

Ok mystery solved, I've pulled the role trough galaxy and apparently debian-11.yml is missing... Probably because the artifact on galaxy was not yet updated from this repo.

On top I've cloned the actual repo and mixed them up when editing some stuff...

lae commented 1 year ago

Ah. You're right. Give me a moment to update it.

Unfortunately it doesn't auto-update anymore since Travis CI is dead now, I should fix that sometime.

martino87r commented 1 year ago

I believe once you update it we can close this as it should work fine :dancers:

lae commented 1 year ago

Updated the Galaxy artifact, though I just checked the change history since v1.0.3 and the Debian 11 definitions aren't actually in that release anyway, so you'd still need to use the git repo with ansible-galaxy.

I'll see if I can just go ahead and cut a new release anyway and publish that.

martino87r commented 1 year ago

Just following the documentation in the README leads to this hole if you're unlucky to be running debian >= 11, would be awesome if the galaxy artifact would include the updates, i guess adding support for a major debian version would warrant a release, but it's up to you.

lae commented 1 year ago

It definitely warrants a release. I think I just forgot. I'm performing a full test locally first to make sure nothing else is broken.

lae commented 1 year ago

Alright, done. https://github.com/lae/ansible-role-netbox/releases/tag/v1.0.4