networktocode / diffsync

A utility library for comparing and synchronizing different datasets.
https://diffsync.readthedocs.io/
Other
155 stars 26 forks source link

DiffSync Fails to Identify Discrepancies in Item Sets Between Source and Target #246

Closed gt732 closed 1 year ago

gt732 commented 1 year ago

Environment

Observed Behavior

When using the diffsync library to compare datasets between a "source" and a "target," the diffing process does not detect missing items in the target that exist in the source. Specifically, when an item is present in the source dataset but not in the target, the diff_to() method does not flag this inconsistency. The output indicates "(no diffs)" even though there is a clear discrepancy in the datasets.

Expected Behavior

It's expected that if an item exists in the source dataset and not in the target, the diffing process should identify this and report the item as missing from the target. The detailed report should list all discrepancies between the source and target datasets, including any items that are present in one but not the other.

Steps to Reproduce

  1. Set up two adapters, SourceAdapter and TargetAdapter, both subclasses of DiffSync, with a simple model Item represented by a unique identifier and a name.
  2. Load the SourceAdapter with three items: "Item 1", "Item 2", and "Item 3".
  3. Load the TargetAdapter with only two of the same items: "Item 1" and "Item 3", specifically omitting "Item 2".
  4. Perform a diff operation using the diff_to() method on the source, comparing it to the target.
  5. Observe the output; even though "Item 2" is missing from the target, the system reports "(no diffs)".
from diffsync import DiffSync, DiffSyncModel
from diffsync.logging import enable_console_logging

class Item(DiffSyncModel):
    """Model representing an item with a unique identifier."""

    _modelname = "item"
    _identifiers = ("uid",)
    _attributes = ("name",)

    uid: str
    name: str

class SourceAdapter(DiffSync):
    """Source Adapter with the initial dataset."""

    model = {"item": Item}

    def load(self):
        items = [
            {"uid": "1", "name": "Item 1"},
            {"uid": "2", "name": "Item 2"},
            {"uid": "3", "name": "Item 3"},
        ]

        for item_data in items:
            item = Item(**item_data)
            self.add(item)

class TargetAdapter(DiffSync):
    """Target Adapter with fewer items than the Source Adapter."""

    model = {"item": Item}

    def load(self):
        items = [
            {"uid": "1", "name": "Item 1"},
            # Item 2 is missing
            {"uid": "3", "name": "Item 3"},
        ]

        for item_data in items:
            item = Item(**item_data)
            self.add(item)

enable_console_logging(verbosity=2)

source = SourceAdapter()
source.load()
print("Source data:", source.get_all(Item._modelname))

target = TargetAdapter()
target.load()
print("Target data:", target.get_all(Item._modelname))

diff = source.diff_to(target)

print(diff.str())

Results

Source data: [item "1", item "2", item "3"]
Target data: [item "1", item "3"]
2023-10-17 20:19.33 [debug    ] Diff calculation between these two datasets will involve 5 models [diffsync.helpers] dst=<TargetAdapter> flags=<DiffSyncFlags.NONE: 0> src=<SourceAdapter>
2023-10-17 20:19.33 [info     ] Beginning diff calculation     [diffsync.helpers] dst=<TargetAdapter> flags=<DiffSyncFlags.NONE: 0> src=<SourceAdapter>
2023-10-17 20:19.33 [info     ] Diff calculation complete      [diffsync.helpers] dst=<TargetAdapter> flags=<DiffSyncFlags.NONE: 0> src=<SourceAdapter>
(no diffs)
glennmatthews commented 1 year ago

Hi, I'm not sure where the model = {"item": Item} syntax came from, but after changing it to:

class SourceAdapter(DiffSync):
    """Source Adapter with the initial dataset."""

    item = Item
    top_level = ["item"]

    def load(self):
        # ...

class TargetAdapter(DiffSync):
    """Target Adapter with fewer items than the Source Adapter."""

    item = Item
    top_level = ["item"]

    def load(self):
        # ...

things work just fine:

Source data: [item "1", item "2", item "3"]
Target data: [item "1", item "3"]
2023-10-17 20:48.45 [debug    ] Diff calculation between these two datasets will involve 5 models [diffsync.helpers] dst=<TargetAdapter> flags=<DiffSyncFlags.NONE: 0> src=<SourceAdapter>
2023-10-17 20:48.45 [info     ] Beginning diff calculation     [diffsync.helpers] dst=<TargetAdapter> flags=<DiffSyncFlags.NONE: 0> src=<SourceAdapter>
2023-10-17 20:48.45 [info     ] Diff calculation complete      [diffsync.helpers] dst=<TargetAdapter> flags=<DiffSyncFlags.NONE: 0> src=<SourceAdapter>
item
  item: 2 MISSING in TargetAdapter
gt732 commented 1 year ago

@glennmatthews Thank you that worked! Massive brain fart today :(