netbox-community / netbox

The premier source of truth powering network automation. Open source under Apache 2. Try NetBox Cloud free: https://netboxlabs.com/free-netbox-cloud/
http://netboxlabs.com/oss/netbox/
Apache License 2.0
15.82k stars 2.54k forks source link

Slow GraphQL performance after upgrading to v3.5.3 #13216

Closed glesys-andreas closed 9 months ago

glesys-andreas commented 1 year ago

NetBox version

v3.5.6

Python version

3.9

Steps to Reproduce

  1. create a vlan with tag 999 and add it to a vlan group, does not matter which one.
  2. Create a bunch of prefixes about 400 - 500: [{"prefix": "10.0.0.0/24", "tenant": "Cyberdyne Systems", "status": "active", "vlan": 999}, {"prefix": "10.0.1.0/24", "tenant": "Cyberdyne Systems", "status": "active", "vlan": 999}.....]
  3. Run this query:
    query allActivePrefixes {
    prefix_list(status: "active", family: 4) {
        prefix
        description
        tenant {
            name
        }
        vlan {
            group {
                name
            }
            vid
        }
    }
    }

Expected Behavior

In our prod environment running v3.5.2 this query takes about 200-400 ms and contains about 1700 prefixes,

Observed Behavior

Test 1: In our test environment running v3.5.6 the same query takes about 6-8 seconds and contains the same data as prod (1700 prefixes).

Test 2: Created about 500 vlans on demo.netbox.dev ( currently v3.5.6 ) and got about the same response time here: 6-8 seconds.

The problem started when we upgraded to v3.5.3

kkthxbye-code commented 1 year ago

Is it possible for you to verify that 3.5.0 or 3.5.1 is also slow? Looking at the changes, 3.5.2 is the only release which has the upgraded graphene-django version (3.0.2) which broke the graphql explorer. It was reverted (to 3.0.0) in 3.5.3.

For context:

https://github.com/netbox-community/netbox/issues/12762 https://github.com/netbox-community/netbox/issues/12762#issuecomment-1569880855

Upgrading graphene-django would require fixing the graphql explorer, not that it's necessarily hard, someone just needs to do it.

glesys-andreas commented 1 year ago

Ran the same test on a host running version 3.5.0 . The query takes about 5 seconds with 1600 prefixes and with no other load on the system.

jeremystretch commented 1 year ago

@glesys-andreas have you been able to identify any changes that would improve performance?

This may become a moot point if we adopt Strawberry for v4.0 (see #9856).

glesys-andreas commented 12 months ago

Hi, We are currently running 3.6.0 on our test environment and we are experiencing the same (slow) performance there.

Would be nice to see it implemented. We use the GraphQl API a lot so it would really make things a bit snappier :)

jeremystretch commented 12 months ago

@glesys-andreas as we've been struggling to collect feedback on the potential migration, it may not happen. Could you please review #13583 and let us know what you think of the proposed change to the query syntax?

glesys-andreas commented 12 months ago

@jeremystretch I've checked with our developers that builds the integrations. If you decide to switch there wont be any issue on our side, we'll just rewrite our scripts according to the new syntax.

jeremystretch commented 9 months ago

I'm going to close this out as no specific changes have been proposed. If anyone would like to continue researching potential optimizations, please feel free to do so, and submit a new issue detailing the specific changes to be made if you find something.