pytoolz / toolz

A functional standard library for Python.
http://toolz.readthedocs.org/
Other
4.57k stars 258 forks source link

Split two lists/iterables of objects "wanted_state", "existing_state" in 3 lists "create", "update", "delete" #567

Open LLyaudet opened 1 year ago

LLyaudet commented 1 year ago

Hello, I'm trying to find the best way to modify the database when I want to copy some object parametering from one site to one site in Django. Let's assume I have a model

class MyModel:
    def __init__(self, some_id, name, site_id):
        self.id = some_id
        self.name = name
        self.site_ids
        # ... some other interesting fields I want to copy

The ids and site_ids are distinct between the two sites. However, I would like to pair objects by name and do only the database modifications that are required. What is the best way to do this using toolz API, please ? I will detail some solutions I have found later. Best regards, Laurent Lyaudet

LLyaudet commented 1 year ago

Some solution is for example:

def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
    one_only_list = []
    both_list = []
    two_only_list = []
    objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
    objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
    keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
    for some_key in keys:
         if objects_dict1.get(some_key) is None:
             two_only_list.append(objects_dict2[some_key])
         elif objects_dict2.get(some_key) is None:
             one_only_list.append(objects_dict1[some_key])
         else:
             both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
    return one_only_list, both_list, two_only_list

I did not test this code. I just coded it directly in the issue comment. But you get the idea :)

LLyaudet commented 1 year ago

Until now I did some adhoc code every time I had to do this, but clearly it can be factorized. I would be very happy if you find time to improve my function above, and even more happy if you would consider to add this kind of function in toolz :)

LLyaudet commented 1 year ago

To complete the example, you can do:

wanted_state_objects = MyModel.objects.filter(site_id=1)
existing_state_objects = MyModel.objects.filter(site_id=2)

to_create_objects, to_update_objects, to_delete_objects = get_modification_lists(
    wanted_state_objects,
    existing_state_objects,
    "name",
)

for some_object in to_create_objects:
    some_object.site_id = 2
    some_object.id = None
MyModel.objects.bulk_create(to_create_objects)

MyModel.objects.bulk_delete(to_delete_objects)

true_updates = []
for wanted_object, existing_object in to_update_objects:
    # The code in this loop should go in a method of the model
    is_modified = False
    if  existing_object.some_parameter_interesting_field1 != wanted_object.some_parameter_interesting_field1:
        existing_object.some_parameter_interesting_field1 = wanted_object.some_parameter_interesting_field1
        is_modified = True
    if existing_object.some_parameter_interesting_field2 != wanted_object.some_parameter_interesting_field2:
        existing_object.some_parameter_interesting_field2 = wanted_object.some_parameter_interesting_field2
        is_modified = True
    # ...
    if is_modified:
        true_updates.append(existing_object)

# You can enhance this with updated_fields
MyModel.objects.bulk_update(true_updates)

This way you can truly minimize database load :)

groutr commented 12 months ago

This looks a lot like different types of joins. pytoolz has a join method (https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.join)

If I understand your description, doing a full outer join and inspecting the result will tell you which operation should be performed.

from toolz import join, identity
a = list(range(10))
b = list(range(5, 15))

for el1, el2 in join(identity, a, identity, b, left_default=None, right_default=None):
    if el1 is None:
        # These are elements that appear in b, but not in a
    elif el2 is None:
        # These are elements that appear in a, but not in b
    else:
        # These are elements that appear in both a and b

You would want to replace the identity function with attrgetter to get the appropriate attribute off each object for comparison.

LLyaudet commented 12 months ago

Thank you very much for your answer @groutr . I checked the doc and corrected a typo: https://github.com/pytoolz/toolz/pull/568 It is interesting to see it as a full outer join :) Nevertheless, it doesn't factorize the code enough since you want the three lists for bulk database modification. With full outer join, you replace:

def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
    one_only_list = []
    both_list = []
    two_only_list = []
    objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
    objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
    keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
    for some_key in keys:
         if objects_dict1.get(some_key) is None:
             two_only_list.append(objects_dict2[some_key])
         elif objects_dict2.get(some_key) is None:
             one_only_list.append(objects_dict1[some_key])
         else:
             both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
    return one_only_list, both_list, two_only_list

with

def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
    one_only_list = []
    both_list = []
    two_only_list = []
    for object1, object2 in join(
        lambda x: getattr(x, pairing_field),
        objects_list1,
        lambda x: getattr(x, pairing_field),
        objects_list2,
        left_default=None,
        right_default=None,
    ):
         if object1 is None:
             two_only_list.append(object2)
         elif object2 is None:
             one_only_list.append(object1)
         else:
             both_list.append((object1, object2))
    return one_only_list, both_list, two_only_list

But I think join is the right track for naming a new function. both_list corresponds to inner join. one_only_list corresponds to left outer join "minus" inner join. two_only_list corresponds to right outer join "minus" inner join. Something like

LEFT_OUTER_JOIN_MINUS_INNER_JOIN = (1,)
INNER_JOIN = (2,)
RIGHT_OUTER_JOIN_MINUS_INNER_JOIN = (3,)
LEFT_OUTER_JOIN = (1,2)
RIGHT_OUTER_JOIN = (2,3)
FULL_OUTER_JOIN = (1,2,3)
ANTI_JOIN = (1, 3)

def join_synthesis(
    leftkey,
    leftseq,
    rightkey,
    rightseq,
    joins_seq,
):
    ....

And my use case could be:

one_only_list, both_list, two_only_list = join_synthesis(
    lambda x: getattr(x, pairing_field),
    objects_list1,
    lambda x: getattr(x, pairing_field),
    objects_list2,
    joins_seq=(LEFT_OUTER_JOIN_MINUS_INNER_JOIN, INNER_JOIN, RIGHT_OUTER_JOIN_MINUS_INNER_JOIN),
)

It would be nice :) Thanks :)