networktocode / diffsync

A utility library for comparing and synchronizing different datasets.
https://diffsync.readthedocs.io/
Other
155 stars 26 forks source link

How can one reduce duplication of effort between create & update #243

Closed jamesharr closed 1 year ago

jamesharr commented 1 year ago

Environment

Proposed Functionality

I'm note entirely sure if this is a feature request or just a question for "how are others doing this?". If it's the latter, then maybe it's just worth writing down in the docs as a pattern.

I find myself writing two different ways to take a DiffSync object and marshall it for use in an API (be it Nautobot, NSO, or something other system). I write one for .create() and one for .update(). It winds up being repetitious and error prone.

I have a couple approaches I use today, which are outlined below, at least one of of which I'm pretty sure is broken. I also outline a couple of proposals below, which might be impractical/undesirable. If it turns out this is OK today, I'm thinking it might be good to document it as a common pattern.

Use Case

Given the following model:

class MyModel(DiffSyncModel):
    _modelname = "mymodel"
    _identifiers = ("name",)
    _shortname = ()
    _attributes = (
        "field1",
        "field2",
        "field3",
    )
    _children = {}

    name: str
    field1: str
    field2: str
    field3: str

    @classmethod
    def create(cls, diffsync: NautobotBackend, ids: Mapping, attrs: Mapping) -> Optional[DiffSyncModel]:
        obj = cls(**ids, **attrs)
        my_api.update(obj.name, obj.to_api())
        return obj

    def update(self, attrs: Mapping) -> Optional[DiffSyncModel]:
        ... area in question
        ... do I somehow use .to_api()?
        ... do I copy .to_api() in here and modify it?
        return super().update(attrs)

    def delete(self) -> Optional[DiffSyncModel]:
        return super().delete()

    # Below is a pattern I wind up using often

    @classmethod
    def from_api(cls, api_data: Mapping):
        # called by MyBackend.load() to instantiate object
        return cls(name=api_data["name"], field1=api_data["Field1"], ...)

    def to_api(self):
        rv = {}
        rv["Field1"] = self.field1
        rv["Field2"] = self.field2
        ...
        return rv

In the update() method, I wind up mostly copying what's in the to_api() method. I've more than once completely forgotten to add a new field in one of the two places when I augment a new field. What I'd really like to do is use the to_api() method in both create & update. If that's not practical, I'd like to understand what others are doing to write marshalling code just once.

One note: on both create/update, I'm sending the full object contents. I might actually construct a .to_api() method that can give full or partial object contents.

Existing Approach 1 - Just write two ways to marshall an object

In addition to writing the .to_nso() method, write similar code in .update() to marshall the updated attributes. This is the "most correct" way to do it that I have implemented, but winds up with duplicate marshalling code.

    def update(self, attrs: Mapping) -> Optional[DiffSyncModel]:
        update_data = {}
        if "field1" in attrs:
            update_data["Field1"] = attrs["field1"]
        ... repeat for every field ...
        try:
            my_api.update(id=self.name, data=update_data)
            return super().update(attrs)
        except Exception:
            ...
        return None

Existing Approach 2 - use update anyway and futz with set_status()

I've done this, but I'm pretty sure this is broken and leaves diffsync in an odd state if the call to my_api.update() fails.

    def update(self, attrs: Mapping) -> Optional[DiffSyncModel]:
        rv = super().update(attrs)
        self.set_status(DiffSyncStatus.UNKNOWN, "Update incomplete")
        try:
            my_api.update(id=self.name, data=self.to_api())
            self.set_status(DiffSyncStatus.SUCCESS, "Update successful")
            return rv
        except Exception:
            ...
        return None

Existing Approach 3 - make a copy of self to deal with marshalling

So I think this is probably OK, but I'm not actually sure if it's guaranteed to not screw up diffsync internals. The syntax is a little atrocious and it makes a lot of extra objects, but it works.

    def update(self, attrs: Mapping) -> Optional[DiffSyncModel]:
        future_self = type(self)(**(self.dict() | attrs))
        try:
            my_api.update(id=self.name, data=future_self.to_api()
            return super().update(attrs)
        except Exception:
            ...
        return None

Existing approach 3a

Now that I'm writing this Issue, I might create a BaseModel subclass that has this method to make it a little easier. Still no clue if this is proper and/or will cause problems now or in the future.

class MyBaseModel(BaseModel):
    def clone_with(self, attrs: Mapping) -> DiffSyncModel:
        return type(self)(**(self.dict() | attrs))

class MyModel(MyBaseModel):
    ... rest of it looks just like "existing approach 3", just with nasty syntax isolated.

Proposal 1 - allow an update without setting the status

Admittedly, I'm not sure if this is a good approach, I'm just throwing it out there. It would update the object without setting a status message.

Per documentation, modifying self, then returning None seems like undefined behavior at best.

    def update(self, attrs: Mapping) -> Optional[DiffSyncModel]:
        rv = super().update_without_status(attrs)
        try:
            my_api.update(id=self.name, data=self.to_api())
            self.set_status(DiffSyncStatus.SUCCESS, "Update successful")
            return rv
        except Exception:
            ...
        return None

Proposal 2 - return a new object that will replace the old object

This might be doable today, but I couldn't find docs and haven't tested it, yet.

    def update(self, attrs: Mapping) -> Optional[DiffSyncModel]:
        new_obj = cls(...)
        try:
            my_api.update(id=self.name, data=self.to_api())
            return new_obj
        except Exception:
            ...
        return None

Proposal 3 - do everything with a dict

This works today, but I personally like to stick with object attributes and not dictionary keys because, well, linting and type checking. I feel like this gives up a lot of the benefits that something Pydantic (or even @dataclass) gives developers.

    @classmethod
    def dict_to_api(cls, d: Mapping) -> Mapping:
        rv = {}
        if "field1" in d:
            rv["Field1"] = d["field1"]
        if "field2" in d:
            rv["Field2"] = d["field2"]
       ... wash rinse repeat
        return rv

    @classmethod
    def create(cls, diffsync, ids, attrs) -> Optional[BaseModel]:
        api_data = cls.dict_to_api(ids | attrs)
        id = my_api.create(api_data)
        return cls(**ids, **attrs, api_id=id)

    def update(self, attrs: Mapping) -> Optional[BaseModel]:
        api_data = cls.dict_to_api(attrs)
        try:
            my_api.update(id=self.api_id, data=api_data)
            return super().update(attrs)
        except ...:
            ...
        return None
Kircheneer commented 1 year ago

This is an interesting topic and one I have encountered before. I tend to factor out similarities into a separate method and call that from both create and update, but you are definitely correct in that this should probably be something that is either explicitly documented as a good practice or something that has intrinsic support in diffsync.

Kircheneer commented 1 year ago

I do think that we should discourage users from manually setting statuses on objects as this can come with unexpected side effects. I will be honest and say that I don't quite understand your proposals - can you give a full example with the concrete new API we would have to implement on the diffsync side and what benefits it offers?

jamesharr commented 1 year ago

I've thought about this a bit more and I think that my pain points aren't necessarily around create/update, but might actually be around status reporting. I think I'm OK closing this issue.

If you have some time in the next couple of weeks, I'd like to bounce some ideas off you that may or may not be appropriate in 2.0. It'll probably be more effective to discuss over a video call and spitball ideas. I'll touch base over NTC Slack