networktocode / diffsync

A utility library for comparing and synchronizing different datasets.
https://diffsync.readthedocs.io/
Other
153 stars 27 forks source link

Allow accessing source object model in sync_to/sync_from #199

Open draggeta opened 1 year ago

draggeta commented 1 year ago

Environment

Proposed Functionality

Similar I believe to #60. Have the entire data model of the source available for create and update calls to the remote system.

Use Case

We currently want to sync some data between ITSM systems. ITSM A and B returns a list of available attachments, but not the data within. I can compare based on the data returned which attachments exists on both sides and compare. The used model is:

class Attachment(DiffSyncModel):
    _modelname = "attachment"
    _identifiers = ("name",)
    _shortname = ()
    _attributes = ()
    _children = {}

    id: str
    name: str

I do not want to compare on the id as they are only locally significant. However, to get the attachment content, I need the id. If you do a sync from A to B, that id becomes necessary, but the id of A is not available in the create function attributes or somewhere else.

As a workaround I edited the above class:

class Attachment(DiffSyncModel):
    _modelname = "attachment"
    _identifiers = ("name",)
    _shortname = ()
    _attributes = ("data",)
    _children = {}

    id: str
    name: str
    data: str | None = None

I create a diff and then edit, in a loop, the diff data property to include the id of A. In the create function for B I can then perform a call to the A source system to get the data and sync it.

Is there a better way to do this, where the source class or data is available in it's entirety?

smk4664 commented 1 year ago

In this case, I would make AttachmentData its own model, and make it a child of Attachment. You would then either pass the identifier of Attachment to AttachmentData. This would ensure that you have that data available when you need it. I am making assumption that you can then do something like api/attachments/?name= within your AttachmentData create and update class in order to lookup the id of Attachment. The Adapter will then handle separating the Attachment from AttachmentData and handling the child relationship.

draggeta commented 1 year ago

Hi @smk4664,

Thank you for the quick reply.

I'm trying to figure out what you're describing here. I'll postfix everything with either A or B when discussing this.

Suppose I have AttachmentA with the id and a child AttachmentDataA with the same id and no AttachmentB (and no child AttachmentDataB). This means that I need to create the attachment in side B.

If I do a A.sync_to(B), then the result is a diff without the A side id, right? The only way I could see this working is:

smk4664 commented 1 year ago

Ah yes, AttachmentData could be a list then, so you would create multiple items, like you would interfaces on a device. In your instance of Attachment A has 5 AttachmentData items, and you do a sync_to(B), it will produce a diff of only the changes. So, Attachment B has 4 AttachmentData items the diff will only show the AttachmentData that does not match.

smk4664 commented 1 year ago

Can you look up id inside the model? Instead of passing it as an attribute or identifier?

draggeta commented 1 year ago

Now I understand what you meant and it is actually what I'm doing right now.

class Attachment(DiffSyncModel):
    _modelname = "attachment"
    _identifiers = ("name",)
    _shortname = ()
    _attributes = (
        "jira_id",
        "cust_id",
    )
    _children = {}

    id: str
    jira_id: str | None
    cust_id: str
    name: str

class Issue(DiffSyncModel):
    _modelname = "issue"
    _identifiers = ("jira_id", "cust_id")
    _shortname = ()
    _attributes = ()
    _children = {"attachment": "attachments"}

    jira_id: str | None
    cust_id: str
    attachments: list[Attachment] = []

And looking up the id would be a great idea now that you mention it. I don't think it will work here sadly. For system A, the call to an issue doesn't return any attachments. I have to do a separate call to get all attachments metadata for all issues matching a filter (it's silly, but out of my control). After comparing the items, I then do a call for the contents of attachments missing from B and sync those. There is also an API call limit per day so I cannot query each item individually or I may get blocked for the remainder of the day.

For now this seems to work:

    jira = JiraAdapter()
    jira.load()

    cust = CustAdapter()
    cust.load()

    diff_c_j = cust.diff_to(jira, flags=DiffSyncFlags.SKIP_UNMATCHED_DST)
    for i in diff_c_j.children["issue"].values():
        for j in i.child_diff.children["attachment"].values():
            j.source_attrs["source"] = cust.get(Attachment, j.name)
    cust.sync_to(jira, flags=DiffSyncFlags.SKIP_UNMATCHED_DST, diff=diff_c_j)

The create function then looks like this:

class JiraAttachment(Attachment):
    """Extend the Attachment object to store Jira specific information."""

    @classmethod
    def create(cls, diffsync: DiffSync, ids, attrs: dict):

        source: Attachment = attrs.pop("source")
        [call to get attachment data from remote system based on source.id]
Kircheneer commented 1 year ago

@draggeta anything else needed from the diffsync side or is this clear now?

draggeta commented 1 year ago

Hi @Kircheneer, it would be a nice feature to have, but this workaround suffices. Thank you for the help :)