networktocode / diffsync

A utility library for comparing and synchronizing different datasets.
https://diffsync.readthedocs.io/
Other
155 stars 26 forks source link

Incremental Data Updates #192

Open itdependsnetworks opened 1 year ago

itdependsnetworks commented 1 year ago

Environment

Proposed Functionality

Provide the ability to sync updates as they happen. This may be a specific implementation of #142, but I think it makes sense to consider.

Use Case

There are times in which near real time sync is required and greatly wanted. If you consider a workflow that adds a device to the SoR, updating that on all systems, such as monitoring systems.

Kircheneer commented 1 year ago

I don't quite follow as to what

Provide the ability to sync updates as they happen.

means. Do you want to subscribe to changes webhook-like?

itdependsnetworks commented 1 year ago

means. Do you want to subscribe to changes webhook-like?

Correct.

Kircheneer commented 1 year ago

So we would be looking at the implementation of something that either listens to webhooks/similar if that functionality is available on the source system or periodically queries out to the source system and calculates the diff, syncing if there is any?

itdependsnetworks commented 1 year ago

I don't know tbh, my mind was in the kafka bus mindset. That being said, it is likely more about the signature more-so than the actual integration.

Kircheneer commented 1 year ago

So what kind of API should diffsync specifically offer to facilitate this? The functionality for creating a diff and not syncing it is already there, so you could feasibly write an integration that triggers diffsync based on an event on a bus, couldn't you?

itdependsnetworks commented 1 year ago

Just spit balling here in 30 seconds.

Kircheneer commented 1 year ago

Outcome of a verbal discussion:

Think about the possibility of having (next to having just a load function) methods for load_$model_name to load specific models (possibly by identifiers) and their dependencies, and possibly have a load_all_$model_name(filters) to load all the model names according to a specific set of filters. It is currently unclear to me whether the currently child/parent relationships are modeled in detail enough to facilitate this use case.

How does this help us?

This would enable an outside integration listening to an event bus to only act on those event specific consequences, which could be faster by orders of magnitude to execute than the entire synchronization.

itdependsnetworks commented 1 year ago