netbox-community / netbox

The premier source of truth powering network automation. Open source under Apache 2. Try NetBox Cloud free: https://netboxlabs.com/free-netbox-cloud/
http://netboxlabs.com/oss/netbox/
Apache License 2.0
15.82k stars 2.54k forks source link

API - Description Document multiple issues #13370

Closed commonism closed 1 year ago

commonism commented 1 year ago

NetBox version

v3.5.6 - netbox-docker

Python version

3.10

Steps to Reproduce

This won't work for you, it's just meant to be a preview how things currently look like. Does not have to be async - it's just what I started with.

import logging
from typing import Dict
from pathlib import Path

import aiopenapi3

import httpx
import pytest
import pytest_asyncio

from netbox_webhook.netbox import sign
from netbox_webhook.models import Event

#app.debug = True

import aiopenapi3.plugin
class SetServer(aiopenapi3.plugin.Document):
    def parsed(self, ctx: "Document.Context") -> "Document.Context":
        ctx.document["servers"][0]["url"] = "/"
        return ctx

class FixSchemas(aiopenapi3.plugin.Init):
    def schema(self, ctx: "Init.Context") -> "Init.Context":
        from aiopenapi3 import v30
        ctx.schema: Dict[str, v30.Schema]

        # dcim_device_roles_create response
        ctx.schema["DeviceRole"].required = list(filter(lambda x: x not in ["device_count", "virtualmachine_count"], ctx.schema["DeviceRole"].required))

        # dcim_sites_create response
        ctx.schema["Sites"].required = list(filter(lambda x: x not in ["circuit_count","device_count", "prefix_count","rack_count","virtualmachine_count","vlan_count"], ctx.schema["Sites"].required))

        # dcim_devices_create response
        ctx.schema["Device"].required = list(filter(
            lambda x: x not in ["parent_device", "primary_ip"], ctx.schemas["Device"].required))

        # dcim_devices_create response
        ctx.schema["DeviceWithConfigContext"].required = list(filter(
            lambda x: x not in ["parent_device", "primary_ip"], ctx.schema["DeviceWithConfigContext"].required))

        for i in ["parent_device", "primary_ip"]:
            ctx.schema["DeviceWithConfigContext"].properties[i].nullable = True

        return ctx

class FixPaths(aiopenapi3.plugin.Init):

    @staticmethod
    def operationbyId(ctx, name):
        for path,item in ctx.paths.items():
            for i in ["get","post","patch","head"]:
                if (operation := getattr(item, i, None)) == None:
                    continue
                if operation.operationId == name:
                    return path,i,operation
        raise KeyError(name)

    @staticmethod
    def parameterbyname(operation, name):
        for parameter in operation.parameters:
            if parameter.name == name:
                return parameter
        raise KeyError(name)

    def paths(self, ctx: "Init.Context") -> "Init.Context":
        for i in ["dcim_device_roles_list", "dcim_manufacturers_list", "dcim_device_types_list","dcim_sites_list"]:
            _,_,op = self.operationbyId(ctx, i)
            p = self.parameterbyname(op, "slug")
            p.schema_ = aiopenapi3.v30.Schema(type="string")

        for i in ["dcim_devices_list"]:
            _,_,op = self.operationbyId(ctx, i)
            p = self.parameterbyname(op, "name")
            p.schema_ = aiopenapi3.v30.Schema(type="string")

        return ctx

from aiopenapi3.debug import httpx_debug_event_hooks_async

@pytest_asyncio.fixture(scope="session")
async def client(event_loop):
    url = f"http://127.0.0.1:8000/api/schema/"

    def sf(*args, **kwargs) -> httpx.AsyncClient:
        event_hooks = httpx_debug_event_hooks_async()
        return httpx.AsyncClient(*args, timeout=30, event_hooks=event_hooks, **kwargs)

    plugins = [SetServer(), FixSchemas(), FixPaths()]
    path = Path("/tmp/nb.pickle")
    try:
        api = aiopenapi3.OpenAPI.cache_load(path, plugins=plugins, session_factory=sf)
    except FileNotFoundError:
        api = await aiopenapi3.OpenAPI.load_async(url, session_factory=sf, plugins = plugins)
        api.cache_store(path)

    api.authenticate(tokenAuth="Token 0a2c1764de239b00de09f8b89758bf778ed916f4")
    return api

@pytest_asyncio.fixture(scope="session")
async def DeviceRole(client):
    """dcim_device_roles_create"""

    t = client._.dcim_device_roles_create.data.get_type()
    data = t(name="Server", slug="server")
    try:
        r = await client._.dcim_device_roles_create(data=data.model_dump(mode="json", exclude_unset=True))
    except aiopenapi3.errors.HTTPStatusError as e:
        if e.response.status_code == 400:
            r = await client._.dcim_device_roles_list(parameters={"slug":data.slug})
            assert r.count == 1
            r = r.results[0]
        else:
            logging.exception(e)
            raise e
    return r

@pytest_asyncio.fixture(scope="session")
async def DeviceType(client, Manufacturer):
    """dcim_device_types_create"""
    t = client._.dcim_device_types_create.data.get_type()
    data = t(manufacturer=Manufacturer.id, model="PowerEdge T320", slug="t320")
    try:
        r = await client._.dcim_device_types_create(data=data)
    except aiopenapi3.errors.HTTPStatusError as e:
        if e.response.status_code == 400:
            r = await client._.dcim_device_types_list(parameters={"slug":data.slug})
            assert r.count == 1
            r = r.results[0]
        else:
            logging.exception(e)
            raise e
    return r

@pytest_asyncio.fixture(scope="session")
async def Manufacturer(client):
    """dcim_manufacturers_create"""
    t = client._.dcim_manufacturers_create.data.get_type()
    data = t(name="Dell", slug="dell")
    try:
        r = await client._.dcim_manufacturers_create(data=data)
    except aiopenapi3.errors.HTTPStatusError as e:
        if e.response.status_code == 400:
            r = await client._.dcim_manufacturers_list(parameters={"slug":data.slug})
            assert r.count == 1
            r = r.results[0]
        else:
            logging.exception(e)
            raise e
    return r

@pytest_asyncio.fixture(scope="session")
async def Site(client):
    """dcim_sites_create"""
    t = client._.dcim_sites_create.data.get_type()
    data = t(name="Home", slug="home", status="active")
    try:
        r = await client._.dcim_sites_create(data=data.model_dump(mode="json", exclude_unset=True))
    except aiopenapi3.errors.HTTPStatusError as e:
        if e.response.status_code == 400:
            r = await client._.dcim_sites_list(parameters={"slug":data.slug})
            assert r.count == 1
            r = r.results[0]
        else:
            logging.exception(e)
            raise e

    return r

@pytest_asyncio.fixture(scope="session")
async def Device(client, DeviceRole, DeviceType, Site):
    """dcim_devices_create"""
    t = client._.dcim_devices_create.data.get_type()
    data = t(device_type=DeviceType.id, device_role=DeviceRole.id, site=Site.id, status="active", name="Server1")
    try:
        r = await client._.dcim_devices_list(parameters={"name":data.name})
        r = r.results[0]
    except IndexError:
        try:
            r = await client._.dcim_devices_create(data=data.model_dump(mode="json", exclude_unset=True))
        except aiopenapi3.errors.HTTPStatusError as e:
            print(e)
            r = None

    return r

@pytest.mark.asyncio
async def test_client(client):
    assert client

@pytest.mark.asyncio
async def test_create(client, Device):
    assert Device

Expected Behavior

A description document which matches the protocol.

Observed Behavior

Using the Netbox OpenAPI Description document to interface the service is impossible due to the mismatching definition of Schemas and Operations from the description document and the actual service. Even the server url definition in the document is wrong, making it impossible to create the proper path of an operation.

All PathItems query string parameters, which are defined to be of type array:

  /api/dcim/device-roles/:
    get:
      operationId: dcim_device_roles_list
      description: Get a list of device role objects.
      parameters:
      - in: query
        name: color
        schema:
          type: array
          items:
            type: string
        explode: true
        style: form

should be type: string.

For the response of _create (usually) there is properties required which do not exist, for _list (usually) there is properties which need to be nullable to parse the result. Sending null for empty properties on create causes errors, not sending the value at all works.

I tried DeviceRole DeviceType Manufacturer Site Device.

In case of a conflict (already exists) when creating an item, it's always 400 - which is not even defined as response for the operations.

Initially I tried to use the Netbox Description Document to create a netbox-webhook service using FastAPI, making use of the Description document to create models to parse the webhook data in FastAPI. I did not want to rely on a webhook protocol which is undocumented, where messages can't be validated and therefore was quite happy to see I could grab the definitions from the description document, not even implementing them on my own. Easy models for all webhook types - worked great for my netbox-docker with IPAddress, good opportunity to use python Generics, loved it. Next I was unable to figure out the format of the Snapshots model type for the netbox-plugin-dns, it is different from everything which is defined in the description document and basically as undocumented as the format of all other webhook types. Next I figured the Snapshot format for IPAddress was different from the Schema I used (IPAddressRequest) when the address had more properties populated.

So I decided to test if the API was matching the description document. It's basically a pytest, using aiopenapi3, making use of the aiopenapi3 plugin system to adjust the definitions along the way to match the protocol so I could get to the point to create a Device and have all the requirements fulfilled by using the API.

Are you interested in integrating unit tests to match the openapi description document to the API in your pipeline? It would be beneficial to the quality of the description document, current state is not usable for the API and webhooks if you want to validate the messages/use objects.

arthanson commented 1 year ago

@commonism Thank you for the report, there is a lot here so a bit difficult to parse through it - it looks like there are several bugs and a feature suggestion / request all bundled together here, it would be helpful to break these apart into separate issues. It looks like the potential bugs/issues you brought up are:

  1. url definition in the document is wrong
  2. color query parma is type array and should be type string?
  3. There are properties which need to be nullable? Not sure on this one - from "_create (usually) there is properties required which do not exist, for _list (usually) there is properties which need to be nullable to parse the result."
  4. 400 needs to be listed as response? "In case of a conflict (already exists) when creating an item, it's always 400"

Not sure if I missed any in here, can you please open separate issues for each of these with a short example and expected behavior, or for any I missed above. For example, #3 above I'm not even sure what that is in relation to.

It sounds like there is also a feature suggestion "Are you interested in integrating unit tests to match the openapi description document to the API in your pipeline?" - if there is a good way to do a validation of the schema that would be good, but it would depend on what is involved in it and if it required extra dependencies. If you have a proposed solution, please open a ticket for that as well.

commonism commented 1 year ago

The only operation I worked with without any issues so far is ipam_ip_addresses_create - the others:

basically every Parameter I worked with was flawed in the description document and >90% of the Operations.

Due to the size of the description document:

πŸš— References: 607 πŸ“¦ External Documents: 0 πŸ“ˆ Schemas: 613 πŸ‘‰ Parameters: 1269 πŸ”— Links: 0 ➑️ Path Items: 235 πŸ‘· Operations: 929 πŸ”– Tags: 12

I expect this to be like ~1000+ issues. This is structural problem which has to be addressed aside from fixing single issue.

If you are interested in working this out - great, I got some unit tests to start with and I'm happy to share. They need a github CI pipeline which deploys a netbox-docker service for testing. Once this is in place, you'll have automated validation of the API for every change.

And there is plenty of unaddressed https://github.com/netbox-community/netbox/labels/topic%3A%20OpenAPI issues already, do not want to add another.

commonism commented 1 year ago

As an example of such issue - lets take a random operation … dcim_devices_list

  /api/dcim/devices/:
    get:
      operationId: dcim_devices_list
      description: Get a list of device objects.
      parameters:
…
      - in: query
        name: name
        schema:
          type: array
          items:
            type: string
        explode: true
        style: form
…
      tags:
      - dcim
      security:
      - cookieAuth: []
      - tokenAuth: []
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/PaginatedDeviceWithConfigContextList'
          description: ''

and the response type

    PaginatedDeviceWithConfigContextList:
      type: object
      properties:
        count:
          type: integer
          example: 123
        next:
          type: string
          nullable: true
          format: uri
          example: http://api.example.org/accounts/?offset=400&limit=100
        previous:
          type: string
          nullable: true
          format: uri
          example: http://api.example.org/accounts/?offset=200&limit=100
        results:
          type: array
          items:
            $ref: '#/components/schemas/DeviceWithConfigContext'

and the array items reference

    DeviceWithConfigContext:
      type: object
      description: Adds support for custom fields and tags.
      properties:
        id:
          type: integer
          readOnly: true
        url:
          type: string
          format: uri
          readOnly: true
        display:
          type: string
          readOnly: true
        name:
          type: string
          nullable: true
          maxLength: 64
        device_type:
          $ref: '#/components/schemas/NestedDeviceType'
        device_role:
          $ref: '#/components/schemas/NestedDeviceRole'
        tenant:
          allOf:
          - $ref: '#/components/schemas/NestedTenant'
          nullable: true
        platform:
          allOf:
          - $ref: '#/components/schemas/NestedPlatform'
          nullable: true
        serial:
          type: string
          title: Serial number
          description: Chassis serial number, assigned by the manufacturer
          maxLength: 50
        asset_tag:
          type: string
          nullable: true
          description: A unique tag used to identify this device
          maxLength: 50
        site:
          $ref: '#/components/schemas/NestedSite'
        location:
          allOf:
          - $ref: '#/components/schemas/NestedLocation'
          nullable: true
        rack:
          allOf:
          - $ref: '#/components/schemas/NestedRack'
          nullable: true
        position:
          type: number
          format: double
          maximum: 1000
          minimum: 0.5
          exclusiveMaximum: true
          nullable: true
          title: Position (U)
        face:
          type: object
          properties:
            value:
              enum:
              - front
              - rear
              - ''
              type: string
              description: |-
                * `front` - Front
                * `rear` - Rear
            label:
              type: string
              enum:
              - Front
              - Rear
        parent_device:
          allOf:
          - $ref: '#/components/schemas/NestedDevice'
          readOnly: true
        status:
          type: object
          properties:
            value:
              enum:
              - offline
              - active
              - planned
              - staged
              - failed
              - inventory
              - decommissioning
              type: string
              description: |-
                * `offline` - Offline
                * `active` - Active
                * `planned` - Planned
                * `staged` - Staged
                * `failed` - Failed
                * `inventory` - Inventory
                * `decommissioning` - Decommissioning
            label:
              type: string
              enum:
              - Offline
              - Active
              - Planned
              - Staged
              - Failed
              - Inventory
              - Decommissioning
        airflow:
          type: object
          properties:
            value:
              enum:
              - front-to-rear
              - rear-to-front
              - left-to-right
              - right-to-left
              - side-to-rear
              - passive
              - mixed
              - ''
              type: string
              description: |-
                * `front-to-rear` - Front to rear
                * `rear-to-front` - Rear to front
                * `left-to-right` - Left to right
                * `right-to-left` - Right to left
                * `side-to-rear` - Side to rear
                * `passive` - Passive
                * `mixed` - Mixed
            label:
              type: string
              enum:
              - Front to rear
              - Rear to front
              - Left to right
              - Right to left
              - Side to rear
              - Passive
              - Mixed
        primary_ip:
          allOf:
          - $ref: '#/components/schemas/NestedIPAddress'
          readOnly: true
        primary_ip4:
          allOf:
          - $ref: '#/components/schemas/NestedIPAddress'
          nullable: true
        primary_ip6:
          allOf:
          - $ref: '#/components/schemas/NestedIPAddress'
          nullable: true
        cluster:
          allOf:
          - $ref: '#/components/schemas/NestedCluster'
          nullable: true
        virtual_chassis:
          allOf:
          - $ref: '#/components/schemas/NestedVirtualChassis'
          nullable: true
        vc_position:
          type: integer
          maximum: 255
          minimum: 0
          nullable: true
        vc_priority:
          type: integer
          maximum: 255
          minimum: 0
          nullable: true
          description: Virtual chassis master election priority
        description:
          type: string
          maxLength: 200
        comments:
          type: string
        local_context_data:
          type: object
          additionalProperties: {}
          nullable: true
          description: Local config context data takes precedence over source contexts
            in the final rendered config context
        tags:
          type: array
          items:
            $ref: '#/components/schemas/NestedTag'
        custom_fields:
          type: object
          additionalProperties: {}
        config_context:
          type: object
          additionalProperties: {}
          nullable: true
          readOnly: true
        config_template:
          allOf:
          - $ref: '#/components/schemas/NestedConfigTemplate'
          nullable: true
        created:
          type: string
          format: date-time
          readOnly: true
          nullable: true
        last_updated:
          type: string
          format: date-time
          readOnly: true
          nullable: true
      required:
      - config_context
      - created
      - device_role
      - device_type
      - display
      - id
      - last_updated
      - parent_device
      - primary_ip
      - site
      - url

now - I claimed

parameter name is defined as array of string

      - in: query
        name: name
        schema:
          type: array
          items:
            type: string
        explode: true
        style: form

confirmed and by searching this for "type: array" you can already spot other query parameters which are defined invalid.

response missing ["parent_device", "primary_ip"]

the response

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": 6,
      "url": "http://127.0.0.1:8000/api/dcim/devices/6/",
      "display": "Server2",
      "name": "Server2",
      "device_type": {
        "id": 5,
        "url": "http://127.0.0.1:8000/api/dcim/device-types/5/",
        "display": "PowerEdge T320",
        "manufacturer": {
          "id": 5,
          "url": "http://127.0.0.1:8000/api/dcim/manufacturers/5/",
          "display": "Dell",
          "name": "Dell",
          "slug": "dell"
        },
        "model": "PowerEdge T320",
        "slug": "t320"
      },
      "device_role": {
        "id": 4,
        "url": "http://127.0.0.1:8000/api/dcim/device-roles/4/",
        "display": "Server",
        "name": "Server",
        "slug": "server"
      },
      "tenant": null,
      "platform": null,
      "serial": "",
      "asset_tag": null,
      "site": {
        "id": 3,
        "url": "http://127.0.0.1:8000/api/dcim/sites/3/",
        "display": "Home",
        "name": "Home",
        "slug": "home"
      },
      "location": null,
      "rack": null,
      "position": null,
      "face": null,
      "parent_device": null,
      "status": {
        "value": "active",
        "label": "Active"
      },
      "airflow": null,
      "primary_ip": null,
      "primary_ip4": null,
      "primary_ip6": null,
      "cluster": null,
      "virtual_chassis": null,
      "vc_position": null,
      "vc_priority": null,
      "description": "",
      "comments": "",
      "local_context_data": null,
      "tags": [],
      "custom_fields": {},
      "config_context": {},
      "config_template": null,
      "created": "2023-08-04T06:21:27.676342Z",
      "last_updated": "2023-08-04T16:19:42.060416Z"
    }
  ]
}

definition of parent_device

        parent_device:
          allOf:
          - $ref: '#/components/schemas/NestedDevice'
          readOnly: true

value:

      "parent_device": null,

definition primary_ip:

        primary_ip:
          allOf:
          - $ref: '#/components/schemas/NestedIPAddress'

value

      "primary_ip": null,
kkthxbye-code commented 1 year ago

All PathItems query string parameters, which are defined to be of type array: should be type: string.

ipam_ip_addresses_list parameter address is defined as array of string

https://demo.netbox.dev/api/ipam/ip-addresses/?address=192.168.0.1%2F22&address=192.168.0.2%2F22

They are arrays though? And the swagger UI handles this fine: image

Maybe there's not actually "1000+" issues, but you have misunderstood some of them?

commonism commented 1 year ago

Good call. I was not passing a list/array for the array parameters - validation complained about my invalid parameter type. Repaired this in the wrong place. arrays actually work for …_list when used properly.

Down to ~50% of operations I tested being bad.

arthanson commented 1 year ago

@commonism that is the problem having all these together in one issue, part of them may be incorrect usage and part may be actual bugs and it would take a maintainer a lot of time to go through and try to figure out what each of the problems you are referencing, try to come up with a repro scenario for each one then figure out if it is an actual bug or incorrect usage.

The "url definition in the document is wrong" looks like it could be a bug, I'm going to check and if so open a separate issue for that.

I'm not sure what the other 50% of the issues are you are referencing actually are, please open a separate issue for each of them. We don't expect an issue for each instance of a bug (for example if a string type is returned incorrectly in multiple type definitions) just a single bug with an example and reference the same issue appears in multiple types. For example that query string param being a list above would have been just one issue, even though it effects many types. This issue can then be closed out.

commonism commented 1 year ago

All _create operations I tried return data which is missing properties which are defined as required - and without default value. Thats been 50% of the operations I worked with, the other 50% was _list operations.

Without unit testing infrastructure to validate changes - I'm afraid this won't work out.

kkthxbye-code commented 1 year ago

Without unit testing infrastructure to validate changes - I'm afraid this won't work out.

We'll have to close out this issue then. If you change your mind, you are more than welcome to create split out issue, or even a feature request requesting the aforementioned infrastructure.

jsenecal commented 1 year ago

@commonism You are welcome to open a feature request to implement unit tests against the schema like you proposed. This would go a long way in identifying said issues. You also seem to have already done some work in that direction, Feel free to ask to be assigned to the feature request.

Once this is all in place, we could tackle one bug at a time. :)

Thanks for your energy and contribution in documenting the problem you are experiencing.

Edit: Looks like you already did https://github.com/netbox-community/netbox/issues/13420 πŸ˜„