linux-system-roles / test-harness

Test harness for linux system roles
GNU General Public License v3.0
10 stars 18 forks source link

CI tests with collections option fail with "__spec__ is None" if the role has module_utils #124

Open nhosoi opened 3 years ago

nhosoi commented 3 years ago

Roles affected: certificate, network, storage

Failed test example: https://fedorapeople.org/groups/linuxsystemroles/logs/linux-system-roles-network-pull-linux-system-roles_network-319-a4d19a4-centos-8-20201122-233541/artifacts/ansible.log

The full traceback is:

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/ansible/executor/task_executor.py", line 147, in run
    res = self._execute()
  File "/usr/lib/python3.8/site-packages/ansible/executor/task_executor.py", line 665, in _execute
    result = self._handler.run(task_vars=variables)
  File "/usr/lib/python3.8/site-packages/ansible/plugins/action/normal.py", line 46, in run
    result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
  File "/usr/lib/python3.8/site-packages/ansible/plugins/action/__init__.py", line 825, in _execute_module
    (module_style, shebang, module_data, module_path) = self._configure_module(module_name=module_name, module_ar
gs=module_args, task_vars=task_vars)
  File "/usr/lib/python3.8/site-packages/ansible/plugins/action/__init__.py", line 206, in _configure_module
    (module_data, module_style, module_shebang) = modify_module(module_name, module_path, module_args, self._temp
lar,
  File "/usr/lib/python3.8/site-packages/ansible/executor/module_common.py", line 1250, in modify_module
    (b_module_data, module_style, shebang) = _find_module_utils(module_name, b_module_data, module_path, module_a
rgs, task_vars, templar, module_compression,
  File "/usr/lib/python3.8/site-packages/ansible/executor/module_common.py", line 1089, in _find_module_utils
    recursive_finder(module_name, remote_module_fqn, b_module_data, py_module_names,
  File "/usr/lib/python3.8/site-packages/ansible/executor/module_common.py", line 880, in recursive_finder
    recursive_finder(py_module_file[-1], next_fqn, py_module_cache[py_module_file][0],
  File "/usr/lib/python3.8/site-packages/ansible/executor/module_common.py", line 725, in recursive_finder
    module_info = CollectionModuleInfo(py_module_name[-idx],
  File "/usr/lib/python3.8/site-packages/ansible/executor/module_common.py", line 645, in __init__
    self.get_source()
  File "/usr/lib/python3.8/site-packages/ansible/executor/module_common.py", line 661, in get_source
    data = pkgutil.get_data(to_native(self._package_name), to_native(self._mod_name + '.py'))
  File "/usr/lib64/python3.8/pkgutil.py", line 619, in get_data
    spec = importlib.util.find_spec(package)
  File "/usr/lib64/python3.8/importlib/util.py", line 114, in find_spec
    raise ValueError('{}.__spec__ is None'.format(name))
ValueError: ansible_collections.fedora.system_roles.plugins.module_utils.network_lsr.nm.provider.__spec__ is None
fatal: [/cache/centos-8.qcow2]: FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}

To reproduce the failure, run-tests needs to be run in docker/podman/openshift. I.e., the tests are successfully run if run-tests is executed locally.

Version info:

ansible 2.9.14
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/tester/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.8/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.8.6 (default, Sep 25 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)]

The failed backtraces above look quite similar to the ones in these ansible-freeipa issues. https://github.com/freeipa/ansible-freeipa/issues/230 https://github.com/freeipa/ansible-freeipa/issues/144 https://github.com/freeipa/ansible-freeipa/issues/315 There is a discussion in the ansible itself, as well. https://github.com/ansible/ansible/issues/68361 According to this pr207, the problem is fixed in ansible 2.10. https://github.com/freeipa/ansible-freeipa/issues/207

I think we need to find out our workaround for ansible 2.9.

Another unrelated note: ANSIBLE_COLLECTIONS_PATHS is likely deprecated and replaced with ANSIBLE_COLLECTIONS_PATH in ansible 2.10.

nhosoi commented 3 years ago

Response from @t-woerner

yes, this is a known Ansible issue: https://github.com/ansible/ansible/issues/68361

ansible-freeipa has been running into this months ago already. Also I have been
talking about this in the system-roles meetings before. This is fixed in
Ansible 2.10 so far. A backport to 2.9 was planned, but I am not sure that it
happened.

The source of this issue is that Ansible is trying to find out which files need
to be transferred from the controller to the node to be able to execute the
task. The new collection code is doing this in a different way an is failing if
there are bindings that can not be imported. The bindings might only exist on
the node and not the controller. Ansible 2.10 has a fix for this, but a
backport to the release that you are using in the test seems to be missing. I
have not tested this lately.

A possible workaround is to install the bindings that are used in the module
utils script also on the controller. Another solution is to move all imports in
the module_utils script into a try except section. But this will require
special error handling in the modules later on.
nhosoi commented 3 years ago

I tested with the latest version of ansible-2.9 [0], and got the same failure [1]. [0]

$ ansible --version
ansible 2.9.15
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/tester/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.8/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.8.6 (default, Sep 25 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)]

[1]

  File "/usr/lib64/python3.8/importlib/util.py", line 114, in find_spec
    raise ValueError('{}.__spec__ is None'.format(name))
ValueError: ansible_collections.fedora.system_roles.plugins.module_utils.network_lsr.nm.provider.__spec__ is None