nautobot / nautobot-ansible

Ansible Collection for managing Nautobot Data
https://nautobot-ansible.readthedocs.io/en/latest/
GNU General Public License v3.0
47 stars 32 forks source link

Inventory failure IncompleteRead #406

Open EdificomSA opened 3 months ago

EdificomSA commented 3 months ago
ISSUE TYPE
SOFTWARE VERSIONS
pynautobot

2.2.1

Ansible:

2.15.9

Nautobot:

2.2.6

Collection:

networktocode.nautobot 5.2.1

SUMMARY

AWX inventory sync randomly fails

STEPS TO REPRODUCE
plugin: networktocode.nautobot.inventory
#api_endpoint: # NAUTOBOT_URL
#token: 1234567890123456478901234567  # NAUTOBOT_TOKEN
config_context: true
flatten_config_context: true
flatten_local_context_data: true
virtual_chassis_name: true
plurals: false
#group_names_raw: True
fetch_all: false
max_uri_length: 1000
services: true
group_by:
- tenant
- tenant_group
- location
- rack
- rack_group
- rack_role
- tag
- role
- device_type
- manufacturer
- platform
- cluster
- cluster_type
- cluster_group
- is_virtual
- status

query_filters:
- status: Active
- has_primary_ip: 'true'
- role: Virtual Server
- role: Physical Server
EXPECTED RESULTS

Based on idea from #367, might be a good idea to add a retry on error in fetch_api_docs like in #338 for lookup

ACTUAL RESULTS
ansible-inventory [core 2.15.9]
  config file = /runner/project/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /runner/requirements_collections:/root/.ansible/collections:/usr/share/ansible/collections:/usr/share/automation-controller/collections
  executable location = /usr/local/bin/ansible-inventory
  python version = 3.9.18 (main, Jan 24 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] (/usr/bin/python3)
  jinja version = 3.1.3
  libyaml = True
Using /runner/project/ansible.cfg as config file
host_list declined parsing /runner/project/inventories/nautobot-servers.yml as it did not pass its verify_file() method
script declined parsing /runner/project/inventories/nautobot-servers.yml as it did not pass its verify_file() method
Using inventory plugin 'ansible_collections.networktocode.nautobot.plugins.inventory.inventory' to process inventory source '/runner/project/inventories/nautobot-servers.yml'
Fetching: https://nautobot.xxx/api/docs/?format=openapi
toml declined parsing /runner/project/inventories/nautobot-servers.yml as it did not pass its verify_file() method
[WARNING]:  * Failed to parse /runner/project/inventories/nautobot-servers.yml
with auto plugin: IncompleteRead(13326730 bytes read, 2155194 more expected)
  File "/usr/local/lib/python3.9/site-packages/ansible/inventory/manager.py", line 293, in parse_source
    plugin.parse(self._inventory, self._loader, source, cache=cache)
  File "/usr/local/lib/python3.9/site-packages/ansible/plugins/inventory/auto.py", line 59, in parse
    plugin.parse(inventory, loader, path, cache=cache)
  File "/runner/requirements_collections/ansible_collections/networktocode/nautobot/plugins/inventory/inventory.py", line 1358, in parse
    self.main()
  File "/runner/requirements_collections/ansible_collections/networktocode/nautobot/plugins/inventory/inventory.py", line 1279, in main
    self.fetch_api_docs()
  File "/runner/requirements_collections/ansible_collections/networktocode/nautobot/plugins/inventory/inventory.py", line 1024, in fetch_api_docs
    openapi = self._fetch_information(self.api_endpoint + "/api/docs/?format=openapi")
  File "/runner/requirements_collections/ansible_collections/networktocode/nautobot/plugins/inventory/inventory.py", line 308, in _fetch_information
    raw_data = to_text(response.read(), errors="surrogate_or_strict")
  File "/usr/lib64/python3.9/http/client.py", line 476, in read
    s = self._safe_read(self.length)
  File "/usr/lib64/python3.9/http/client.py", line 628, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
joewesch commented 3 months ago

I'm not sure this error is the same as the issues you linked. The standard modules and the lookup module use pynautobot, which has the retries argument available. The inventory modules, however, use Ansible's standard open_url function.

Secondly, HTTP or Connection errors should raise during the section of the code above where your error got raised: https://github.com/nautobot/nautobot-ansible/blob/900395a17d199fe1816a887fee4cc7992a01e56a/plugins/inventory/inventory.py#L296-L305

This leads me to believe it got a response, but the response was incomplete in some way.

So, not as an easy fix as just enabling retries. Not impossible, but will probably take extra work to implement.

EdificomSA commented 3 months ago

You're right, it's probably not linked to the mentioned issue

I checked the Retry from urllib3 and in fact it does not behave with IncompleteRead exeception https://github.com/nautobot/pynautobot/blob/develop/pynautobot/core/api.py#L91

So somehow sometimes fetching the api spec at https://nautobot.xxx/api/docs/?format=openapi will return an IncompleteRead

I'll try solve the issue at the core meaning Nautobot API response, unsure where to look at tho (timeout ?)

EdificomSA commented 5 days ago

I'm not 100% sure yet, but it appears that the Inventory plugin timeout default theoretical 60s value is not set anywhere in https://github.com/nautobot/nautobot-ansible/blob/develop/plugins/inventory/inventory.py and relies on module_utils open_url default timeout of 10sec https://github.com/ansible/ansible/blob/stable-2.15/lib/ansible/module_utils/urls.py#L1673

joewesch commented 4 days ago

it appears that the Inventory plugin timeout default theoretical 60s value is not set anywhere

The default 60s is indeed set on the documentation on line 169. Inventory plugins are "documentable" plugins meaning the options are set automatically via the yaml styled documentation rather than needing to set them in python like you do for standard ansible modules.

I was able to confirm this by adding the following line to my local dev environment:

        self.timeout = self.get_option("timeout")
        self.display.v("timeout: %s" % self.timeout)

And then I was able to see it in the output:

# inventory.yml
---
plugin: networktocode.nautobot.inventory
api_endpoint: https://demo.nautobot.com/
token: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
❯ ansible-inventory -i inventory.yml --list -vvv
...
Using /root/nautobot-ansible/ansible.cfg as config file
timeout: 60
Fetching: https://demo.nautobot.com/api/docs/?format=openapi
...