netboxlabs / netbox-branching

Official NetBox Labs plugin that implements git-like branching functionality for NetBox
http://netboxlabs.com
Other
69 stars 1 forks source link

Unable to merge a branch with plugin data #56

Open peteeckel opened 3 months ago

peteeckel commented 3 months ago

Plugin Version

0.3.1

NetBox Version

4.1-beta1

Python Version

3.11.5

Steps to Reproduce

  1. Create fresh NetBox DB
  2. Install the current version of the branching plugin and add it to the configuration
  3. Install NetBox DNS 1.1b4 from PyPI and add it to the configuration
  4. Run the migrations and restart NetBox and RQ
  5. Create a new branch "Test"
  6. Activate the new branch
  7. Create a NameServer object in NetBox DNS (e.g. "ns1.example.com")
  8. Create a bunch of zones in NetBox DNS (via import):
    name,nameservers,soa_rname,soa_mname
    zone1.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
    zone2.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
    zone3.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
    zone4.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
    zone5.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
    zone6.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
    zone7.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
    zone8.example.com,ns1.example.com,hostmaster.example.com,ns1.example.com
  9. Deactivate the test branch
  10. Try to merge the test branch

Expected Behavior

The branch is merged and the zones and the nameserver are now available in the main branch

Observed Behavior

Screenshot 2024-08-17 at 15 52 13

This is not the only way I could reproduce this - at one point I got the same error just with an IP Address, without any NetBox DNS objects involved (unfortunately I didn't get a screen shot, though).

Not sure how this relates to #43 - so far I haven't been able to reproduce that one with a fresh DB.

peteeckel commented 3 months ago
Screenshot 2024-08-17 at 16 41 06

I can't make any sense of it yet - this time it was the IP address that triggered the error. Changes look pretty simple:

Screenshot 2024-08-17 at 16 43 10
cruse1977 commented 3 months ago

headers for the import

name,nameservers,soa_rname,soa_mname

cruse1977 commented 3 months ago

Error confirmed:

{
    "log": [
        "Merging branch test (branch_2ze7gm26)",
        "Applying change: NetBox DNS | Nameserver ns1.example.com created by admin",
        "Applying change: NetBox DNS | zone zone1.example.com created by admin",
        "Zone matching query does not exist."
    ]
}
cruse1977 commented 3 months ago

Post change data for fail:

name: "zone1.example.com"
tags: []
view: 1
status: "active"
tech_c: null
tenant: null
admin_c: null
soa_ttl: 86400
billing_c: null
registrar: null
soa_mname: 1
soa_retry: 7200
soa_rname: "hostmaster.example.com"
registrant: null
soa_expire: 2419200
soa_serial: 1
default_ttl: 86400
description: ""
nameservers: [
    1
]
soa_minimum: 3600
soa_refresh: 43200
arpa_network: null
custom_fields: {}
rfc2317_prefix: null
soa_serial_auto: false
registry_domain_id: ""
rfc2317_parent_zone: null
rfc2317_parent_managed: false
cruse1977 commented 3 months ago

LOG

2024-08-22T16:22:56.805484+00:00 nbl-branching-privatepreview-plugins postgres[15843]: [39-1] 2024-08-22 16:22:56.804 UTC [15843] netbox@netbox LOG:  statement: SELECT 1 AS "a" FROM "netbox_dns_view" WHERE "netbox_dns_view"."id" = 1 LIMIT 1
2024-08-22T16:22:56.806121+00:00 nbl-branching-privatepreview-plugins postgres[15843]: [40-1] 2024-08-22 16:22:56.805 UTC [15843] netbox@netbox LOG:  statement: SELECT 1 AS "a" FROM "netbox_dns_nameserver" WHERE "netbox_dns_nameserver"."id" = 1 LIMIT 1
2024-08-22T16:22:56.814416+00:00 nbl-branching-privatepreview-plugins postgres[15843]: [41-1] 2024-08-22 16:22:56.813 UTC [15843] netbox@netbox LOG:  statement: SELECT "netbox_dns_zone"."id", "netbox_dns_zone"."created", "netbox_dns_zone"."last_updated", "netbox_dns_zone"."custom_field_data", "netbox_dns_zone"."view_id", "netbox_dns_zone"."name", "netbox_dns_zone"."status", "netbox_dns_zone"."default_ttl", "netbox_dns_zone"."soa_ttl", "netbox_dns_zone"."soa_mname_id", "netbox_dns_zone"."soa_rname", "netbox_dns_zone"."soa_serial", "netbox_dns_zone"."soa_refresh", "netbox_dns_zone"."soa_retry", "netbox_dns_zone"."soa_expire", "netbox_dns_zone"."soa_minimum", "netbox_dns_zone"."soa_serial_auto", "netbox_dns_zone"."description", "netbox_dns_zone"."arpa_network", "netbox_dns_zone"."tenant_id", "netbox_dns_zone"."registrar_id", "netbox_dns_zone"."registry_domain_id", "netbox_dns_zone"."registrant_id", "netbox_dns_zone"."admin_c_id", "netbox_dns_zone"."tech_c_id",
2024-08-22T16:22:56.814581+00:00 nbl-branching-privatepreview-plugins postgres[15843]: [41-2]  "netbox_dns_zone"."billing_c_id", "netbox_dns_zone"."rfc2317_prefix", "netbox_dns_zone"."rfc2317_parent_managed", "netbox_dns_zone"."rfc2317_parent_zone_id", "netbox_dns_zone"."status" IN ('active') AS "active" FROM "netbox_dns_zone" WHERE "netbox_dns_zone"."id" = 1 LIMIT 21
2024-08-22T16:22:56.814621+00:00 nbl-branching-privatepreview-plugins postgres[15843]: [42-1] 2024-08-22 16:22:56.814 UTC [15843] netbox@netbox LOG:  statement: ROLLBACK

SQL Query

SELECT "netbox_dns_zone"."id", "netbox_dns_zone"."created", "netbox_dns_zone"."last_updated", "netbox_dns_zone"."custom_field_data", "netbox_dns_zone"."view_id", "netbox_dns_zone"."name", "netbox_dns_zone"."status", "netbox_dns_zone"."default_ttl", "netbox_dns_zone"."soa_ttl", "netbox_dns_zone"."soa_mname_id", "netbox_dns_zone"."soa_rname", "netbox_dns_zone"."soa_serial", "netbox_dns_zone"."soa_refresh", "netbox_dns_zone"."soa_retry", "netbox_dns_zone"."soa_expire", "netbox_dns_zone"."soa_minimum", "netbox_dns_zone"."soa_serial_auto", "netbox_dns_zone"."description", "netbox_dns_zone"."arpa_network", "netbox_dns_zone"."tenant_id", "netbox_dns_zone"."registrar_id", "netbox_dns_zone"."registry_domain_id", "netbox_dns_zone"."registrant_id", "netbox_dns_zone"."admin_c_id", "netbox_dns_zone"."tech_c_id","netbox_dns_zone"."billing_c_id", "netbox_dns_zone"."rfc2317_prefix", "netbox_dns_zone"."rfc2317_parent_managed", "netbox_dns_zone"."rfc2317_parent_zone_id", "netbox_dns_zone"."status" IN ('active') AS "active" FROM "netbox_dns_zone" WHERE "netbox_dns_zone"."id" = 1 LIMIT 21

Manually editting the SQL query to run against the branch schema works as expected (returns an entry)

cruse1977 commented 3 months ago
2024-08-22 17:21:17,123 netbox_branching.branch.merge INFO: Merging branch test (branch_2ze7gm26)
2024-08-22 17:21:17,172 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | Nameserver ns1.example.com created by admin
2024-08-22 17:21:17,176 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | Nameserver ns1.example.com created by admin using default
2024-08-22 17:21:17,184 netbox_branching.change.apply DEBUG: Creating Nameserver <DeserializedObject: netbox_dns.NameServer(pk=1)>
2024-08-22 17:21:17,345 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | zone zone1.example.com created by admin
2024-08-22 17:21:17,349 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | zone zone1.example.com created by admin using default
2024-08-22 17:21:17,351 netbox_branching.change.apply DEBUG: Creating zone <DeserializedObject: netbox_dns.Zone(pk=1)>
2024-08-22 17:21:17,374 netbox_branching.change.apply DEBUG: exception on full clean
2024-08-22 17:21:17,375 netbox_branching.change.apply DEBUG: Type: DoesNotExist was raised Zone matching query does not exist.
2024-08-22 17:21:17,380 netbox_branching.change.apply DEBUG: None
2024-08-22 17:21:17,832 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | record zone1.example.com [SOA] created by admin
2024-08-22 17:21:17,838 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | record zone1.example.com [SOA] created by admin using default
2024-08-22 17:21:17,839 netbox_branching.change.apply DEBUG: Creating record <DeserializedObject: netbox_dns.Record(pk=1)>
2024-08-22 17:21:17,969 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | record zone1.example.com [NS] created by admin
2024-08-22 17:21:17,974 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | record zone1.example.com [NS] created by admin using default
2024-08-22 17:21:17,976 netbox_branching.change.apply DEBUG: Creating record <DeserializedObject: netbox_dns.Record(pk=2)>
2024-08-22 17:21:18,007 netbox_branching.change.apply DEBUG: exception on full clean
2024-08-22 17:21:18,007 netbox_branching.change.apply DEBUG: Type: ValidationError was raised {'value': ['There is already an active NS record for name @ in zone zone1.example.com with value ns1.example.com..']}
2024-08-22 17:21:18,010 netbox_branching.change.apply DEBUG: None
2024-08-22 17:21:18,096 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | zone zone2.example.com created by admin
2024-08-22 17:21:18,100 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | zone zone2.example.com created by admin using default
2024-08-22 17:21:18,102 netbox_branching.change.apply DEBUG: Creating zone <DeserializedObject: netbox_dns.Zone(pk=2)>
2024-08-22 17:21:18,119 netbox_branching.change.apply DEBUG: exception on full clean
2024-08-22 17:21:18,119 netbox_branching.change.apply DEBUG: Type: DoesNotExist was raised Zone matching query does not exist.
2024-08-22 17:21:18,120 netbox_branching.change.apply DEBUG: None
2024-08-22 17:21:18,503 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | record zone2.example.com [SOA] created by admin
2024-08-22 17:21:18,509 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | record zone2.example.com [SOA] created by admin using default
2024-08-22 17:21:18,511 netbox_branching.change.apply DEBUG: Creating record <DeserializedObject: netbox_dns.Record(pk=3)>
2024-08-22 17:21:18,652 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | record zone2.example.com [NS] created by admin
2024-08-22 17:21:18,656 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | record zone2.example.com [NS] created by admin using default
2024-08-22 17:21:18,658 netbox_branching.change.apply DEBUG: Creating record <DeserializedObject: netbox_dns.Record(pk=4)>
2024-08-22 17:21:18,686 netbox_branching.change.apply DEBUG: exception on full clean
2024-08-22 17:21:18,688 netbox_branching.change.apply DEBUG: Type: ValidationError was raised {'value': ['There is already an active NS record for name @ in zone zone2.example.com with value ns1.example.com..']}
2024-08-22 17:21:18,689 netbox_branching.change.apply DEBUG: None
2024-08-22 17:21:18,783 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | zone zone3.example.com created by admin
2024-08-22 17:21:18,787 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | zone zone3.example.com created by admin using default
2024-08-22 17:21:18,788 netbox_branching.change.apply DEBUG: Creating zone <DeserializedObject: netbox_dns.Zone(pk=3)>
2024-08-22 17:21:18,803 netbox_branching.change.apply DEBUG: exception on full clean
2024-08-22 17:21:18,805 netbox_branching.change.apply DEBUG: Type: DoesNotExist was raised Zone matching query does not exist.
2024-08-22 17:21:18,805 netbox_branching.change.apply DEBUG: None
2024-08-22 17:21:18,978 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | record zone3.example.com [SOA] created by admin
2024-08-22 17:21:18,980 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | record zone3.example.com [SOA] created by admin using default
2024-08-22 17:21:18,981 netbox_branching.change.apply DEBUG: Creating record <DeserializedObject: netbox_dns.Record(pk=5)>
2024-08-22 17:21:19,037 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | record zone3.example.com [NS] created by admin
2024-08-22 17:21:19,040 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | record zone3.example.com [NS] created by admin using default
2024-08-22 17:21:19,040 netbox_branching.change.apply DEBUG: Creating record <DeserializedObject: netbox_dns.Record(pk=6)>
2024-08-22 17:21:19,051 netbox_branching.change.apply DEBUG: exception on full clean
2024-08-22 17:21:19,052 netbox_branching.change.apply DEBUG: Type: ValidationError was raised {'value': ['There is already an active NS record for name @ in zone zone3.example.com with value ns1.example.com..']}
2024-08-22 17:21:19,054 netbox_branching.change.apply DEBUG: None
2024-08-22 17:21:19,090 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | zone zone4.example.com created by admin
2024-08-22 17:21:19,092 netbox_branching.change.apply DEBUG: Applying change NetBox DNS | zone zone4.example.com created by admin using default
2024-08-22 17:21:19,093 netbox_branching.change.apply DEBUG: Creating zone <DeserializedObject: netbox_dns.Zone(pk=4)>
2024-08-22 17:21:19,097 netbox_branching.change.apply DEBUG: exception on full clean
2024-08-22 17:21:19,097 netbox_branching.change.apply DEBUG: Type: DoesNotExist was raised Zone matching query does not exist.
2024-08-22 17:21:19,098 netbox_branching.change.apply DEBUG: None
2024-08-22 17:21:19,231 netbox_branching.branch.merge DEBUG: Applying change: NetBox DNS | record zone4.example.com [SOA] created by admin
peteeckel commented 3 months ago
2024-08-22 17:21:18,688 netbox_branching.change.apply DEBUG: Type: ValidationError was raised {'value': ['There is already an active NS record for name @ in zone zone2.example.com with value ns1.example.com..']}

I can see where that comes from. Replaying the change log creates the zone, which automatically creates the NS record, and then the branching plugin tries to replay the action adding the NS record as well - which would result in a duplicate record, and hence is refused.

Which is pretty exactly what I meant yesterday :-/

cruse1977 commented 3 months ago

2024-08-22 17:21:17,375 netbox_branching.change.apply DEBUG: Type: DoesNotExist was raised Zone matching query does not exist.

I think so too - certainly for the validation errors. However I don't think we've explained this yet ?

peteeckel commented 3 months ago

The only explanation I have is that branching tries to create records for a zone when the create operation for the zone has not been replayed yet.

I would expect the change log to have the operations in the order they have been originally executed, i.e. first the zone.create() and then the record.create(). But I also remember one of the wierdest bugs in NetBox I've found so far: https://github.com/netbox-community/netbox/issues/15194

I'm not sure there is a relation between the two, but basically the fix for that issue was to deduplicate operations in the event queue, and I can imagine that is where the order of execution gets reversed. Does the change log make use of the event queue? If so, this could be the cause of the problem.

cruse1977 commented 2 months ago

This doesn't occur on creating name servers, only zones. Specific this is the call in the class Zone: clean() method which triggers the error in merge:

image
peteeckel commented 2 months ago

Hmmm ... that's interesting. Boiled down to the minimum it means that I check whether the object has a pk (which means it has been saved to the DB), then retrieve the object from the DB with that pk and it's not there. This can't happen in plain Django, unless the object has been deleted between submitting the form and trying to retrieve the object (it hasn't), the pk is fake (could be, I haven't looked at the branching code yet), or the lookup goes to the wrong branch.

My first guess would be the latter.

jeremystretch commented 2 months ago

I think @cruse1977 has it nailed down: During import, the zones are created within the branch schema. On merge, the creation of each zone is "replayed" onto the main schema, using the change record from the branch. The zone's primary key is preserved and copied from the branch instance.

When the model is then validated prior to calling save(), Zone.clean() issues a query for Zone.objects.get(pk=self.pk). This query resolves to the main schema, as no branch connection is specified via using. Obviously, a zone with that PK doesn't exist in the main schema yet, hence the exception.

While perhaps the most obvious solution would be to activate the branch during model validation, this won't work as uniqueness validation will fail for the PK and any other unique fields.

Unfortunately I'm not sure there's much we can do to accommodate this particular approach. But I can recommend a workaround if you're open to it: Rather than querying for the object's original attributes under clean(), you can override the model's __init__() method to record any relevant attributes on initialization for reference during validation. We do this in a few places within NetBox itself, which seems to work quite well (and it avoids the extra database query).

peteeckel commented 2 months ago

@jeremystretch Thanks, that looks promising ... I'm using the pattern in several places, so this could sum up to a slightly improved performance as well. Let me see if I can get rid of all the database queries that way.

peteeckel commented 2 months ago

Update: I no longer do that kind of query now. Fortunately I had already implemenred a mixin that enhances classes by a changed_fields method that shows which fields have changed compared to the version of an object in the database, so the change was actually pretty well contained.

Unfortunately I run into a different issue now. I still can't merge the branch, just the place where it happens has changed:

Screenshot 2024-09-11 at 19 37 33

I remember I've seen that one before.

I think I have another major issue to tackle. From Jeremy's remarks to #89 I take that when I call save() from a save() method (either super().save() or the save() method of another model) the using kwarg is critical. That absolutely makes sense as in some cases I need to create e.g. NS records for a zone, and that needs to happen within the branch the save() is executed in. That would mean passing the using kwarg from the arguments save() was called with down to any save() that's called along the way, correct? Anything else I need to observe? Does it make sense to add a using(...) to any object manager call?

jeremystretch commented 2 months ago

Unfortunately I run into a different issue now.

I believe this was resolved under #94 and will be fixed in the upcoming v0.5.0 release.