netbox-community / netbox

The premier source of truth powering network automation. Open source under Apache 2. Try NetBox Cloud free: https://netboxlabs.com/free-netbox-cloud/
http://netboxlabs.com/oss/netbox/
Apache License 2.0
15.97k stars 2.56k forks source link

v3.4-beta1: YAML import raises AttributeError exception #11000

Closed peteeckel closed 1 year ago

peteeckel commented 1 year ago

NetBox version

v3.4-beta1

Python version

3.8

Steps to Reproduce

  1. Click on "IP Addresses", "Import"
  2. Select YAML as the input format
  3. Paste the following valid YAML into the form field:
    - address: 10.0.1.1/24
    status: active

Expected Behavior

The data is imported and a new IP address appears in the database

Observed Behavior

An AttributeError exception is raised:

Environment:

Request Method: POST
Request URL: http://192.168.106.105/ipam/ip-addresses/import/

Django Version: 4.1.2
Python Version: 3.8.11
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'django.contrib.humanize',
 'corsheaders',
 'debug_toolbar',
 'graphiql_debug_toolbar',
 'django_filters',
 'django_tables2',
 'django_prometheus',
 'graphene_django',
 'mptt',
 'rest_framework',
 'social_django',
 'taggit',
 'timezone_field',
 'circuits',
 'dcim',
 'ipam',
 'extras',
 'tenancy',
 'users',
 'utilities',
 'virtualization',
 'wireless',
 'django_rq',
 'drf_yasg',
 'netbox_dns.DNSConfig']
Installed Middleware:
['graphiql_debug_toolbar.middleware.DebugToolbarMiddleware',
 'django_prometheus.middleware.PrometheusBeforeMiddleware',
 'corsheaders.middleware.CorsMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'django.middleware.security.SecurityMiddleware',
 'netbox.middleware.ExceptionHandlingMiddleware',
 'netbox.middleware.RemoteUserMiddleware',
 'netbox.middleware.LoginRequiredMiddleware',
 'netbox.middleware.DynamicConfigMiddleware',
 'netbox.middleware.APIVersionMiddleware',
 'netbox.middleware.ObjectChangeMiddleware',
 'django_prometheus.middleware.PrometheusAfterMiddleware']

Traceback (most recent call last):
  File "/opt/netbox/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/opt/netbox/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/netbox/lib/python3.8/site-packages/django/views/generic/base.py", line 103, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/netbox/netbox/netbox/views/generic/base.py", line 13, in dispatch
    return super().dispatch(request, *args, **kwargs)
  File "/opt/netbox/netbox/utilities/views.py", line 99, in dispatch
    return super().dispatch(request, *args, **kwargs)
  File "/opt/netbox/lib/python3.8/site-packages/django/views/generic/base.py", line 142, in dispatch
    return handler(request, *args, **kwargs)
  File "/opt/netbox/netbox/netbox/views/generic/bulk_views.py", line 442, in post
    new_objs = self.create_and_update_objects(form, request)
  File "/opt/netbox/netbox/netbox/views/generic/bulk_views.py", line 364, in create_and_update_objects
    prefetch_ids = [int(record['id']) for record in records if record.get('id')]
  File "/opt/netbox/netbox/netbox/views/generic/bulk_views.py", line 364, in <listcomp>
    prefetch_ids = [int(record['id']) for record in records if record.get('id')]

Exception Type: AttributeError at /ipam/ip-addresses/import/
Exception Value: 'list' object has no attribute 'get'
jeremystretch commented 1 year ago

I'm curious whether the expectation from the user perspective is to provide a list of YAML dictionaries, or separate YAML documents representing each object. The YAML loader expects a list of individual documents (separated with ---).

peteeckel commented 1 year ago

Obviously my expectation was to be able to provide the data in a single YAML array :-)

Anyway, after looking at the code I also tried

- address: 10.0.1.1/24
  status: active
---
- address: 10.0.1.2/24
  status: active

(with various different locations of the separators). Always with the same result.

The problem seems to originate in the fact that whether you provide a single YAML document or a list of multiple documents (of which a list containing one document as I did in my minimalised exaxmple), yaml.load_all() always returns the parsed data itself as a list containing one or more lists of dictionaries:

- address: 10.0.1.0/24
  status: active
- address: 10.0.1.1/24
  status: active

[[{'address': '10.0.1.0/24', 'status': 'active'}, {'address': '10.0.1.1/24', 'status': 'active'}]]
- address: 10.0.1.0/24
  status: active
---
- address: 10.0.1.1/24
  status: active

[[{'address': '10.0.1.0/24', 'status': 'active'}], [{'address': '10.0.1.1/24', 'status': 'active'}]]

Iterating over the result will always give lists as items, and obviously .get() won't work on them.

Personally, I'd flatten the list and iterate over all dictionaries, whether they come in one or several YAML documents. I can't really see the advantage of one or the other structure (maybe I'm missing something, though). I normally provide YAML data in one large document, but there may be data sources that use the multi-document approach. It would be sensible to be able to handle both.

kkthxbye-code commented 1 year ago

Just for clarity, the following is the format currently supported:

address: 10.0.1.1/24
status: active
---
address: 10.0.1.2/24
status: active
peteeckel commented 1 year ago

Hm. That is probably not what anyone would expect when YAML is mentioned. It definitely is a very special case that does not make too much sense to me.

What I would expect in the first place is a list in one document. That's what most tools using YAML export eject, and it's the canonical output when you give yaml.dump with a datastructure roughly equivalent to a CSV:

#!/usr/bin/env python3

import yaml

struct = [
    { "test1": 1, "test2": 2 },
    { "test1": 2, "test2": 4 },
]

print(yaml.dump(struct))

results in

- test1: 1
  test2: 2
- test1: 2
  test2: 4

The multi-document approach is nothing I see very often in everyday YAML practice. It doesn't hurt to support it, but I obviously didn't even have the idea of trying it. The currently supported notation does, however definitely make more sense than the multi-document form of one list element per document I tried.

Anyway, I'd still go for a list of dictionaries.

peteeckel commented 1 year ago

I have a suggestion for a minimal code change that would support at least the three variants discussed here:

diff --git a/netbox/utilities/forms/forms.py b/netbox/utilities/forms/forms.py
index 5756cf0e3..c361f13c3 100644
--- a/netbox/utilities/forms/forms.py
+++ b/netbox/utilities/forms/forms.py
@@ -207,12 +207,24 @@ class ImportForm(BootstrapMixin, forms.Form):

     def _clean_yaml(self, data):
         try:
-            return yaml.load_all(data, Loader=yaml.SafeLoader)
+            records = list(yaml.load_all(data, Loader=yaml.SafeLoader))
         except yaml.error.YAMLError as err:
             raise forms.ValidationError({
                 self.data_field: f"Invalid YAML data: {err}"
             })

+        try:
+            records = sum(records, [])
+        except TypeError:
+            pass
+
+        if any(type(record)!=dict for record in records):
+            raise forms.ValidationError({
+                self.data_field: "Invalid YAML data: A list of dictionaries is required"
+            })
+
+        return records
+

 class FilterForm(BootstrapMixin, forms.Form):
     """

The above code can handle three variants of YAML structure:

variant1 = '''
- address: 10.0.1.1/24
  status: active
- address: 10.0.1.2/24
  status: active
- address: 10.0.1.3/24
  status: active
'''

variant2 = '''
- address: 10.0.1.1/24
  status: active
- address: 10.0.1.2/24
  status: active
---
- address: 10.0.1.3/24
  status: active
'''

variant3 = '''
address: 10.0.1.1/24
status: active
---
address: 10.0.1.2/24
status: active
---
address: 10.0.1.3/24
status: active
'''

Update: It was still possible to input valid YAML that caused Exceptions. The code now makes sure that what is returned is a list of dictionaries.

peteeckel commented 1 year ago

What is feeding my argument as well is that the JSON importer accepts the following data structure:

[
    {
        "address": "10.1.1.1/24",
        "status": "active"
    },
    {
        "address": "10.1.1.2/24",
        "status": "active"
    }
]

This is the exact equivalent of the YAML structure

- address: 10.1.1.1/24
  status: active
- address: 10.1.1.2/24
  status: active

So it really makes sense to support it in addition to the multi-document variant that js supported at the moment. Given the fact that it is currently possible to submit valid YAML structures that cause exceptions trying to parse them _clean_yaml() needs to be revised anyway.

kkthxbye-code commented 1 year ago

it's the canonical output when you give yaml.dump with a datastructure roughly equivalent to a CSV:

While I don't have strong feelings either way, if you use yaml.dump_all() it matches the output netbox expects. Which makes sense as we load using yaml.load_all(). Personally I think it would be best to specify one format, but I don't have any good arguments for or against.

Whatever we decide, we should at least support the format that the DeviceType export function exports in, which is currently multiple yaml documents via. dump_all(), so it needs to be changed if we change the format.

peteeckel commented 1 year ago

I made an experiment, just to check my own perception (sometimes you live in a strange bubble), and asked a few colleagues what they think of first when they should represent the data structure of multiple records with specific fields as a YAML structure. All of them came up with the 'list of dictionaries' idea, some of them didn't even know that the multi-document approach exists (I did, but it was stored away in some very remote attic of my brain).

What's the rationale behind using dump_all()? Is there any use case that makes dump() impractical or less obvious? I'm trying to figure out the advantage of using what I feel is a slightly exotic format.

Anyway, IMHO it makes sense to be tolerant in what you accept and strict in what you emit - so I'd really prefer the approach outlined above of accepting any structure that represents the record structure that needs to be imported.

By the way, I tried to import

{
    "address": "10.1.5.1/24",
    "status": "active"
},
{
    "address": "10.1.5.2/24",
    "status": "active"
}

(with and without the comma between the dictionaries) as JSON input - both were not accepted. So in JSON the only valid form structure is what would, when parsed as YAML (which is possible, JSON usually is valid YAML as well) be represented as

- address: 10.1.5.1/24
  status: active
- address: 10.1.5.2/24
  status: active
jeremystretch commented 1 year ago

Hm. That is probably not what anyone would expect when YAML is mentioned. It definitely is a very special case that does not make too much sense to me.

This is the same way the device type YAML import has worked for years.

peteeckel commented 1 year ago

This is the same way the device type YAML import has worked for years.

Never mind - I can adjust to that, my point is just that IMHO it is a quite unusual variant of YAML.

The main point of this issue still remains - getting the input format wrong while providing a valid YAML structure should not cause an exception later in the process, but a ValidationError while cleaning the input.

The current code only checks for YAML validity (a test that all of my input variants passed) in _clean_yaml and then an exception is raised later when the parsed data structure does not match the implicitly expected format.