nautobot / nautobot-app-golden-config

Golden Configuration App for Nautobot.
https://docs.nautobot.com/projects/golden-config/en/latest/
Other
100 stars 57 forks source link

Redundant Device Settings Map Query #721

Closed jdrew82 closed 8 months ago

jdrew82 commented 8 months ago

Environment

Expected Behavior

When I run the Execute All Golden Configuration Jobs - Multiple Device Job I expect the device to settings map query is performed only once as doing it multiple times is unnecessary and redundant. In addition, when running against a large number of devices the amount of time for that query is exponentially longer and thus doubled when doing the All Golden Configuration Jobs.

Observed Behavior

The query to match devices to Golden Config settings mappings is being done twice, once for intended and once for compliance portion. The compliance portion is unneeded as it's already done in intended.

Steps to Reproduce

  1. Setup Nautobot and Golden Config.
  2. Run All Golden Configuration jobs - Multiple Device Job against 1k+ devices.
  3. Watch logs for gathering device info and amount of time it takes.
itdependsnetworks commented 8 months ago

@whitej6 helped make this better once: #324, i'm not sure what else we can do, so open to ideas.

itdependsnetworks commented 8 months ago

code in reference https://github.com/nautobot/nautobot-app-golden-config/blob/ec8ffebec07b9b30f54fde3e4dc844022a9e3874/nautobot_golden_config/utilities/helper.py#L171-L180

whitej6 commented 8 months ago

In v2 we should be able to improve performance. I believe there's a pre step we can take advantage of now, do the query there and assign to self

lampwins commented 8 months ago

I was trying to address this in https://github.com/nautobot/nautobot-app-golden-config/pull/630 but I think that intent got lost in it being a UI issue on the device detail at the time.

There a few things here.

First device.dynamic_groups.exclude() actually evaluates all DGs to create a queryset of all groups the device is a member of, and then does the exclude on that resultant queryset. Instead, we should filter for only DGs that have a GC setting object attached, then calculate membership to limit the scope.

We should try to use the DG cache here.

We can do this entire code block in one query.

Something like this (derived from my PR, but not tested):

def get_device_to_settings_map(queryset): 
     """Helper function to map settings to devices.""" 
     device_to_settings_map = {} 
     for device in queryset:
         try: 
             dynamic_group = (
                DynamicGroup.objects.filter(golden_config_setting__isnull=False)
                .get_for_object(device, use_cache=True)
                .order_by("-golden_config_setting__weight")
                .first()
             )
         except DynamicGroup.DoesNotExist:
             # No DG for this device
         else:
             if dynamic_group.golden_config_setting: 
                 device_to_settings_map[device.id] = dynamic_group.golden_config_setting 
     return device_to_settings_map 

I forget how golden_config_setting is related, might want a prefetch/select_related?

jeffkala commented 8 months ago

Initial testing to just get some baselines is something I'm doing right now. Will look at the suggestions in this thread and test them while I'm at it. But it definitely seems like its exponential longer each time a new DG is created.

jeffkala commented 8 months ago

As the code is today it takes 2.5 minutes for about 1600 devices.

In [31]: qs = Device.objects.all()

In [32]: result = get_device_to_settings_map(qs)

0:02:26.726344

In [36]: result = get_device_to_settings_map(qs)
0:00:04.471877

If you immediately rerun it its 4 seconds.

With the code using what @lampwins suggested above its the same on the initial execution of the code is taking longer.

In [48]: qs = Device.objects.all()

In [49]: result = get_device_to_settings_map(qs)
0:05:20.409249

Albeit i'm not using timeit and running through thousands of iterations eitherway this is taking along time.

Oddly enough, the second run of the same query takes just as long, not seeing the same speedup seen in the second run like I did in the current function.

lampwins commented 8 months ago

So... I made it worse. Nice! :)

What is the realistic complexity of the DG filters being used in GC? Are DGs really needed, or could we implement a more basic filtering dict on the GC settings model? Think the filter implementation for relationships.

Until we solve DG's implementation (~2.3) I don't think there is a "good" answer here.

jeffkala commented 8 months ago

One thing from my testing last night was if its a single DG its actually pretty quick. Thought maybe implementing https://github.com/nautobot/nautobot-app-golden-config/issues/452 and basically using that as a requirement might work.

The other one I am playing with is making this "best" GC setting a computed field on the device. Not sure @lampwins if computed fields with a template like this is a horrible idea?

{{ obj.dynamic_groups.filter(golden_config_setting__isnull=False).order_by("-golden_config_setting__weight").first().name }}
lampwins commented 8 months ago

The computed field is only going to be rendered on demand, so I don't think it is going to offer any additional benefits, plus it will likely slow down the device detail page.

jeffkala commented 8 months ago

yea, slightly slower.

In [177]: def get_device_to_settings_map(queryset):
     ...:     """Helper function to map settings to devices."""
     ...:     import datetime
     ...:     now = datetime.datetime.now()
     ...:     device_to_settings_map = {}
     ...:     for device in queryset:
     ...:         device_to_settings_map[device.id] = device.get_computed_field("best_gc_setting")
     ...:     print(datetime.datetime.now() - now)
     ...:     return device_to_settings_map
     ...: 

In [178]: qs = Device.objects.all()

In [179]: result = get_device_to_settings_map(qs)
0:02:43.138784
jdrew82 commented 8 months ago

What if we changed how we're determining the device to GC Settings mapping? If I'm reading this correctly it's iterating through every device in the queryset and seeing if it is in a dynamic group that is associated to a Golden Config setting. Why not just iterate through the Golden Config settings, figure out priority by weight, and see if the Devices are in scope for those associated dynamic groups? I would think that'd be far less queries and thus quicker because you already know which dynamic groups you're concerned about by the associated Golden Config settings.

lampwins commented 8 months ago

From the meeting today, consider the ability to optionally use a relationship from Device -> GC Settings object. When the relationship is present on a device, prefer it and when it is not set, fall back to the current logic of trying to map via the DG.

Have a job that users can run to automatically populate the relationship when it is needed in their environment.

This way, it is a non-breaking change and just a "feature" for higher scale environments.

jeffkala commented 8 months ago

runs only once now, was fixed in #724