nautobot / nautobot-app-ssot

Single Source of Truth for Nautobot
https://docs.nautobot.com/projects/ssot/en/latest/
Other
38 stars 41 forks source link

Prototype of `bulk_create` #555

Open abates opened 2 months ago

abates commented 2 months ago

Anyone have any thoughts on this approach? @jdrew82 @Kircheneer

glennmatthews commented 2 months ago

Is the intent here to improve performance? Do you have any benchmarks if so?

abates commented 2 months ago

Is the intent here to improve performance? Do you have any benchmarks if so?

It is indeed an attempt to improve performance. We have an SSoT sync that pulls in ~ 80k locations (5 levels) with the majority of the locations (about 65k) at the lowest level (furthest from the root). With just normal contrib models and validated_save it takes about 72 hours to do an initial sync. With the bulk create approach it takes between 10 and 15 minutes.

chadell commented 2 months ago

I understand that a similar bulk_delete operation could be defined too. isn't it?

abates commented 2 months ago

I understand that a similar bulk_delete operation could be defined too. isn't it?

I believe we can expand this work for bulk updates and deletes.

chadell commented 2 months ago

I understand that a similar bulk_delete operation could be defined too. isn't it?

I believe we can expand this work for bulk updates and deletes.

maybe bulk_update would be harder as I imagine every item will have different data to update.

abates commented 2 months ago

@glennmatthews @jdrew82 @Kircheneer I'd like to proceed with a cleaner prototype with knobs to enable/disable/tweak this functionality. My question to you is:

Should bulk methods be part of the adapter, or part of the model? I think we will have to use the adapter in some way in order to have a place to store the list of objects needing bulk create/update/delete etc, but the entry point could be class methods like we already have in the models.

glennmatthews commented 2 months ago

I can sort of visualize the idea of each model class storing a list of objects to be bulk processed, rather than storing those on the adapter, but that probably gets tricky when you have parent/child relationships added to the mix?

If we implement this via model methods (which conceptually feels preferable to me), would we still need a final pass in the adapter to handle any leftovers at the end?

jdrew82 commented 2 months ago

@glennmatthews @jdrew82 @Kircheneer

I'd like to proceed with a cleaner prototype with knobs to enable/disable/tweak this functionality. My question to you is:

Should bulk methods be part of the adapter, or part of the model? I think we will have to use the adapter in some way in order to have a place to store the list of objects needing bulk create/update/delete etc, but the entry point could be class methods like we already have in the models.

When I've done it the list is on the Nautobot adapter as the bulk operations take place in sync_complete().

abates commented 2 months ago

I can sort of visualize the idea of each model class storing a list of objects to be bulk processed, rather than storing those on the adapter, but that probably gets tricky when you have parent/child relationships added to the mix?

I haven't even considered parent/child relationships, so I need to test that.

If we implement this via model methods (which conceptually feels preferable to me), would we still need a final pass in the adapter to handle any leftovers at the end?

I agree that implementing this in the model class feels cleaner to me, but you hit the nail on the head about a final pass. The nice thing with putting everything in the Adapter is we can leverage sync_complete. Perhaps we can still leverage that method and simply iterate the models.

abates commented 1 month ago

I've made some updates and included tests as well. I want to tackle bulk deletes now.

As I'm working on this a concern has come to my mind. Diffsync really looks at things from a sequential perspective. The base models all have create/update/delete methods that actually change the state of the in-memory pydantic object. The bulk operations all deal with the django object. When a django object is added to the bulk operation queue, the pydantic base model method (create/update/delete) is called before the actual data is updated in the database. This includes calling the clean method. This means that one of these operations could fail when the bulk changes are committed, but the diffsync model has already been marked as successful.

I'm not sure the best way to approach solving this problem.